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Abstract 

The  research  goal  involved  developing  improved  methods  for  securing  Pro¬ 
grammable  Logic  Controller  (PLC)  devices  against  unauthorized  entry  and  miti¬ 
gating  the  risk  of  Supervisory  Control  and  Data  Acquisition  (SCADA)  attack  by 
detecting  malicious  software  and/or  trojan  hardware.  A  Correlation  Based  Anomaly 
Detection  (CBAD)  process  was  developed  to  enable  1)  software  anomaly  detec- 
Pun-discriminating  between  various  operating  conditions  to  detect  malfunctioning 
or  malicious  software,  firmware,  etc.,  and  2)  hardware  component  discrimination - 
discriminating  between  various  hardware  components  to  detect  malfunctioning  or 
counterfeit,  trojan,  etc.,  components. 

Defense  against  software  exploitation  was  implemented  by  1)  adopting  a  previ¬ 
ously  demonstrated  capability  that  provides  human-like  discrimination  of  hardware 
devices  using  information  extracted  from  intentional  Radio  Frequency  (RF)  emis¬ 
sions,  and  2)  adapting  an  RF-based  verification  methodology  to  exploit  information 
in  unintentional  PLC  emissions  to  detect  anomalous  operation  resulting  from  soft¬ 
ware  and/or  hardware  discrepancies  and  enhance  SCADA  security.  Operational 
status  verification  (normal  versus  anomalous)  is  demonstrated  using  experimentally 
collected  emissions  from  ten  Allen  Bradley  SLC-500  PLCs  executing  custom  Ladder 
Logic  Programs  (LLPs)  designed  to  support  the  research  methodology. 

Performance  for  verification-based  software  anomaly  detection  was  evaluated 
using  the  CBAD  process.  The  CBAD  verification  process  is  sequence  agnostic  and 
can  be  used  with  untransformed  Time  Domain  (TD)  or  transformed  inputs,  including 
those  derived  from  untransformed  TD,  Hilbert  transform  (HT),  and  RF  Distinct 
Native  Attribute  (RF-DNA)  features.  Relative  to  performance  using  untransformed 
TD  sequences  or  RF-DNA  features,  CBAD  performance  using  HT  sequences  was 
superior  with  an  arbitrary  Receiver  Operating  Characteristic  (ROC)  curve  Equal 


xv 


Error  Rate  (EER)  benchmark  of  EERb<  10.0%  achieved  for  all  PLC  devices  at  a 
Signal-to-Noise  Ratio  (SNR)  of  SNR— 0.0  dB;  this  benchmark  was  not  achieved  for 
any  PLCs  using  untransformed  TD  sequences  or  RF-DNA  features. 

Performance  for  verification-based  hardware  anomaly  detection  was  evaluated 
using  a  Generalized  Relevance  Learning  Vector  Quantized-Improved  (GRLVQI)  pro¬ 
cess  with  two  input  sequences,  including  one  derived  from  TD  RF-DNA  features 
{^Dim— 156  dimensions)  and  one  from  Correlation  Domain  (CD)  features  {NDvm—10 
dimensions).  For  this  assessment,  ten  Allen  Bradley  PLCs  were  divided  into  autho¬ 
rized/authentic  and  rogue/unknown  groups  containing  five  devices  each.  The  GR¬ 
LVQI  model  was  trained  using  sequences  from  all  authentic  devices  and  each  device 
in  the  unknown  group  was  presented  for  verification  against  each  of  the  authentic 
devices  (25  total  anomaly  assessments).  The  GRLVQI  anomaly  detection  capability 
was  assessed  using  each  of  the  two  input  sequence  types  and  resultant  performance 
was  comparable.  At  SNR— 15.0  dB  an  average  EERze  1.3%  was  achieved  for  TD 
sequences  as  compared  to  an  average  EERm  1.6%  for  the  CD  sequences;  both  se¬ 
quence  types  satisfied  the  EERb<  10.0%  benchmark  for  all  PLC  devices.  While  the 
EER  value  for  TD  sequences  is  0.3%  lower  than  CD  sequences,  the  TD  sequence 
has  nearly  16  times  the  number  of  elements  as  the  CD  sequence  and  a  correspond¬ 
ingly  greater  amount  of  computational  resources  would  be  required  in  an  operational 
implementation. 
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RADIO  FREQUENCY  BASED 
PROGRAMMABLE  LOGIC  CONTROLLER 
ANOMALY  DETECTION 


1.  Introduction 

This  chapter  introduces  the  research  topic  and  outlines  the  motivation  behind 
the  development  of  the  Correlation  Based  Anomaly  Detection  (CBAD)  process  de¬ 
scribed  in  later  chapters.  Section  1.1  provides  a  brief  overview  of  the  operational 
Supervisory  Control  And  Data  Acquisition  (SCADA)  and  Industrial  Control  System 
(ICS)  topology  and  vulnerabilities.  It  is  further  divided  into  two  subsections:  1)  Sec¬ 
tion  1.1.1  describing  the  software- based  vulnerability  picture  for  SCADA/ICS  and 
2)  Section  1.1.2  describing  the  potential  for  hardware- based  security  concerns.  Sec¬ 
tion  1.2  provides  a  brief  description  of  existing  research  and  technologies  supporting 
the  current  research  effort.  Section  1.3  provides  a  summary  of  the  existing  research 
and  technologies  contributions. 

1.1  Operational  Motivation 

Modern  digital  computing  technology  has  led  to  a  proliferation  of  computers 
to  nearly  every  aspect  of  daily  operations  for  the  United  States  Air  Force  (USAF) 
and  Department  of  Defense  (DOD)  as  a  whole.  The  modern  US  military  is  critically 
dependent  on  computer  hardware  and  digital  communication  systems  to  successfully 
carry  out  their  mission  from  checking  email  to  ordering  needed  maintenance  parts. 
The  advantages  in  efficiency  through  use  of  networked  Information  Technology  (IT) 
resources  brings  with  it  the  cost  of  increased  vulnerability  to  malicious  cyber  attacks. 
The  Air  Force  Computer  Emergency  Response  Team  (AFCERT)  is  the  primary 
agency  responsible  for  protecting  USAF  network  assets  from  attack.  The  AFCERT 


1 


reported  nearly  2  million  weekly  alerts  indicating  potential  cyber  attacks  against 
USAF  bases  in  the  month  of  November,  2011  [99],  highlighting  the  magnitude  of 
cyber  threats  facing  networked  IT  systems.  In  addition  to  the  potential  attacks, 
there  have  been  over  150  verified  incidents  of  “hackers”  gaining  access  to  information 
system  assets  affecting  the  USAF  mission  in  2011  [99].  The  threat  of  attack  and 
compromise  of  USAF  information  system  assets  directly  affects  nearly  all  aspects  of 
the  USAF  mission. 

A  key  aspect  of  information  systems  usage  is  the  communication  systems  link¬ 
ing  devices  and  networks.  Data  exchange  occurs  over  computer  networks  (wired 
and  wireless)  as  well  as  over  civilian  communication  networks  (i.e.,  cellular /satellite 
phone  networks).  The  analysis  and  storage  of  potentially  sensitive  data  is  reliant, 
to  a  large  extent,  on  Commercial  Off  The  Shelf  (COTS)  products  either  slightly 
modified  for  military  use  or  not  altered  at  all.  In  order  to  ensure  proper  control 
and  verification  of  the  data  relevant  to  the  military  mission,  it  is  essential  that  the 
devices  used  to  manage  the  data  are  trusted.  Various  vulnerabilities  exist  in  the 
communication  systems  and  data  processing  hardware  currently  used  in  military  ap¬ 
plications.  Although  methods  exist  to  protect  hardware  and  communication  signals 
from  exploitation,  such  as  Anti- Tamper  (AT)  initiatives  for  hardware  and  data  en¬ 
cryption  for  communication  systems,  these  methods  are  not  sufficient  to  guarantee 
the  authenticity  of  computing  platforms,  programs,  or  communication  nodes. 

Another,  less  publicized,  area  of  concern  involves  hardware- based  vulnerabil¬ 
ities.  The  focus  on  cheaper  semiconductor  devices,  such  as  those  at  the  core  of 
SCADA  Programmable  Logic  Controllers  (PLCs),  has  led  to  a  heavy  reliance  on 
overseas  manufacturing  that  results  in  a  greater  risk  of  potentially  damaging  trojan 
or  counterfeit  devices  being  deliberately  used  on  PLC  devices  in  critical  applica¬ 
tions  [17,82],  For  example,  the  DOD  implemented  a  ban  on  the  use  of  thumb  drives 
following  concerns  regarding  virus  transmission  via  the  flash  drive  medium  [8].  Mil¬ 
itary  Field  Programmable  Gate  Array  (FPGA)  systems  are  prone  to  exploitation 
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given  their  reconfigurable  nature.  Even  Integrated  Circuits  (ICs)  fabricated  for  US 
military  use  are  vulnerable  given  the  majority  of  manufacturing  facilities  are  located 
overseas.  Although  research  has  been  focused  towards  combating  the  threat  of  hard¬ 
ware  and  communication  vulnerabilities  [53,69,70,90],  the  verification  of  a  hardware 
platform,  program,  or  communication  node  is  critical  to  protecting  and  validating 
the  data  used  in  carrying  out  every  aspect  of  the  USAF  military  mission. 

Information  Technology  systems  have  also  yielded  unprecedented  levels  of  au¬ 
tomated,  precise  control  of  ICS  operations  for  functions  from  waste  water  treatment 
to  nuclear  power  generation.  ICS  facilities  maintain  critical  infrastructure  capabili¬ 
ties  in  the  civilian  and  Government  sectors.  US  Government  policy  states  “Private 
business,  government,  and  the  national  security  apparatus  increasingly  depend  on 
an  interdependent  network  of  critical  physical  and  information  infrastructures,  in¬ 
cluding  telecommunications,  energy,  financial  services,  water,  and  transportation 
sectors”  [98].  Current  ICS  architectures  are  predominantly  based  on  networked 
digital  computers  that  enable  reliable  monitoring  and  control  of  critical  functions 
within  regionally  localized  and  globally  distributed  operations  [84],  One  key  ele¬ 
ment  of  the  ICS  operation  are  SCADA  systems.  These  provide  centralized  control 
and  monitoring  via  PLC  devices,  which  are  the  gateway  through  which  recent  cy¬ 
ber  attacks  have  been  orchestrated  against  high-profile  ICS  targets  [95, 105].  The 
majority  of  publicized  attacks  target  software- based  vulnerabilities  to  inflict  dam¬ 
age  [12,13,33,44,60,93,94],  While  the  software  vulnerabilities  may  lie  within  a  PLC 
or  other  SCADA  component,  PLCs  represent  the  last  component  to  operationally 
implement  kinetic  effects  caused  by  a  cyber  attack. 

With  such  reliance  on  the  critical  functions  performed  by  ICS  assets  and  fa¬ 
cilities,  the  SCADA  and  PLC  systems  employed  must  be  secured  from  cyber  attack 
similar  to  how  major  IT  systems  are  currently  protected  and  secured.  Unfortunately, 
there  exists  a  gap  between  the  security  options  for  ICS  assets  and  IT  systems.  PLCs 
tend  to  be  specific  purpose  machines  and  often  are  out-dated  by  IT  standards.  There 
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Figure  1.1:  OSI  7-layer  network  model  [7]. 


is  not  an  availability  of  applications  available  for  the  PLCs  aside  from  the  applica¬ 
tions  needed  to  perform  their  specific  tasks.  There  needs  to  be  a  method  of  detecting 
altered  or  anomalous  activity  on  SCADA  and  PLC  hardware  to  thwart  adversarial 
attacks. 

Consider  the  Open  Systems  Interconnect  (OSI)  model  describing  the  differ¬ 
ent  levels  of  a  system  [106]  shown  in  Fig.  1.1.  The  current  focus  on  detection  of 
unauthorized  or  anomalous  activity  on  information  systems  is  through  analysis  of 
the  data  within  the  Application  (Layer  1)  or  Network  (Layer  5)  layers  of  the  model. 
In  communication  systems  such  as  cellular  phone  systems  and  wireless  networking, 
data  access  and  trust  relationships  are  commonly  verified  via  verification  methods 
operating  in  the  Data  Link  Layer  (Layer  2)  of  the  OSI  model.  These  verification  cre¬ 
dentials  include  Media  Access  Control  (MAC)  addresses  for  wireless  network  access 
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and  International  Mobile  Equipment  Identity  (IMEI)  numbers  for  cellular  networks. 
These  verification  methods  are  far  from  foolproof.  There  exist  tools  and  methods 
allowing  an  individual  to  modify  the  verification  credentials,  allowing  adversaries  to 
bypass  the  Data  Link  Layer  control  measures  all  together  [61,73,104],  IT  networks 
employ  firewalls  and  network  Intrusion  Detection  Systems  (IDS)  components  to  de¬ 
tect  and  block  cyber  attacks.  IT  systems  employ  virus  detection  and  host-based  IDS 
programs  to  perform  similar  functions  at  the  individual  computer  level.  SCADA 
and  PLC  systems  are  special  purpose  machines  and  do  not  have  the  general  purpose 
capabilities  of  most  desktop  or  server  computers  in  the  IT  realm.  Additionally,  it  is 
not  uncommon  to  see  ICS  components,  to  include  PLCs,  installed  and  in  operation 
for  decades.  These  PLCs  do  not  have  the  functionality  to  run  the  virus  detection  or 
IDS  programs.  This  exposes  the  system  to  threats  that  have  the  potential  to  enact 
substantial  physical  losses  as  in  the  case  of  StuxNet  or  the  Springfield  attacks. 

The  following  sections  address  the  research  motivation  in  light  of  two  primary 
attack  vectors  used  to  exploit  PLC  vulnerabilities,  including  1)  Section  1.1.1  which 
addresses  software- based  vulnerabilities,  and  2)  Section  1.1.2  which  addresses  hard- 
ware- based  vulnerabilities. 

1.1.1  Software- Based  Vulnerabilities.  Network  and  computer  security  ex¬ 
perts  at  McAfee  predict  2013  will  bring  a  shift  in  the  cyber  warfare  picture  that 
includes  increased  activity  in  nation  state’s  becoming  victims  and  targets  of  cyber 
attacks  [63];  McAfee  suggests  improving  SCADA  system  defense  by  removing  them 
from  the  production  network  and  placing  them  on  a  dedicated  stand-alone  network. 
The  motivation  for  removing  SCADA  systems  from  the  production  network  is  due  in 
part  to  the  number  of  potential  high-value  critical  infrastructure  ICS  targets  (civilian 
and  military)  using  SCADA  control  via  an  unsecure  network.  The  Air  Force  Civil 
Engineering  Center  (AFCEC)  states  that  ICS  assets  in  the  USAF  and  in  industry 
are  “at  best  insufficiently  protected  from  cyber  threats”  [100].  In  acknowledgement 
of  the  criticality  of  US  ICS  infrastructure,  the  LIS  Government  has  prioritized  the 
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defense  of  these  vulnerable  assets  through  a  Presidential  Directive  in  2003  [5]  and 
more  recently  through  an  Executive  Order  in  2013  [66].  Despite  the  focus  on  pro¬ 
tecting  ICS  assets  from  malicious  activity,  they  still  remain  vulnerable.  Over  20 
vulnerability  alerts  and  advisories  have  been  issued  from  the  ICS  Cyber  Emergency 
Response  Team  (ICS-CERT)  in  January  of  2013  alone  [46].  Therefore,  protecting 
vital  ICS  assets  from  the  risk  of  cyber  attack  is  essential  and  is  a  key  component 
used  to  mitigate  the  potential  catastrophic  consequences  if  an  attack  occurs. 

SCADA  systems  typically  monitor  and  control  higher-level  systems  through 
field-based  devices  called  Remote  Terminal  Units  (RTUs)  or  PLCs  that  physically 
implement  desired  functionality.  A  PLC  is  a  special  purpose  computer  that  per¬ 
forms  low-level  ICS  functions,  such  as  collecting  sensor  data  and  operating  physical 
valves  or  switches  [84],  While  the  PLC  Operating  System  (OS)  and  communication 
protocols  are  often  proprietary,  most  current  PLCs  have  the  ability  to  operate  on  a 
standard  network.  It  is  through  these  networks  that  malicious  programs  are  loaded 
onto  vulnerable  PLCs.  A  majority  of  electronic  devices,  including  Personal  Com¬ 
puters  (PCs)  and  network  components,  are  protected  to  some  degree  from  cyber 
attacks  through  a  variety  of  intrusion  detection  and/or  anti  virus  programs.  This 
is  in  sharp  contrast  to  PLC  implementations,  which  have  very  limited  protection 
options  due  to  proprietary  design,  limited  processing  power,  and  limited  memory 
that  precludes  direct  use  of  standard  PC  and  network  anti  virus  programs  [82],  Ad¬ 
ditionally,  many  PLCs  remain  in  service  for  decades  due  to  the  prohibitive  cost  of 
re-engineering  SCADA  systems.  Thus,  PLCs  become  obsolete  and  unsupportablc 
relative  to  IT  standards  and  capabilities  that  continually  evolve  to  satisfy  consumer 
demands  preventing  the  implementation  of  typical  “bit-level”  IT  protective  measures 
in  PLC  devices.  ICS  facilities  remain  vulnerable  to  cyber  attack  as  evidenced  by  re¬ 
cently  successful  Stuxnet  malware-based  attack  [105].  More  recently,  sophisticated 
programs  including  ICS  specific  malware,  such  as  Duqu  and  Flame,  demonstrate  a 
continued  need  for  SCADA  and  ICS  defensive  research.  These  malware  programs 
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contain  computer  code  that  targets  SCADA  and  ICS  functions  through  vulnerable 
components  [12,13,93,105].  Consequently,  there  is  a  vital  need  to  implement  a  pro¬ 
cess  to  detect  malicious  code  installed  on  PLCs  before  the  code  can  execute  and 
cause  irreversible,  catastrophic  effects. 

1.1.2  Hardware-Based,  Vulnerabilities.  In  addition  to  the  high-profile  soft¬ 
ware-based  attacks,  concern  also  exists  regarding  hardware-based  compromise  or¬ 
chestrated  through  trojan  or  counterfeit  semiconductor  or  Integrated  Circuit  (IC) 
devices.  Semiconductor  devices  are  prevalent  and  form  the  core  of  all  computer  sys¬ 
tems  in  use  today  including  those  related  to  SCADA  systems  and  ICS  infrastructure. 
Systems  relying  on  secure  semiconductor  and  IC  devices  are  integrated  within  ICS 
facilities  and  in  use  throughout  DOD  to  process,  store,  and  protect  sensitive  infor¬ 
mation  remain  vulnerable  to  tampering  by  adversaries.  Not  all  forms  of  attacks  are 
malicious  in  nature.  Counterfeit,  used,  or  sub-standard  quality  devices  can  fail  in 
critical  applications,  causing  similar  damage  compared  to  a  malicious  attack.  The 
general  term  component  substitution  can  be  used  to  refer  to  the  substitution  of  a 
genuine,  trusted  component  with  a  counterfeit,  substandard,  or  trojan  component. 
This  substitution  can  be  made  during  manufacture,  assembly,  transport  or  even  af¬ 
ter  operational  deployment.  Unfortunately,  most  organizations  in  the  DOD  do  not 
have  the  means  to  defend  against  counterfeit  or  trojan  component  substitution  [4]. 
Estimates  of  the  losses  due  to  counterfeit  semiconductor  devices  are  staggering.  The 
losses  are  estimated  at  approximately  $200B  USD  with  about  10.0%  of  electronic 
parts  in  use  being  counterfeit  substitutions  [65].  The  intent  of  substitution  varies 
from  malicious  exploitation  of  DOD  systems  to  increasing  company  profits  by  using 
cheaper  components.  To  combat  the  proliferation  of  potentially  harmful  IC  devices 
in  DOD  applications,  the  Defense  Advanced  Research  Projects  Agency  (DARPA)  is 
attempting  to  combat  potential  trojans  in  DOD  ICs  manufactured  in  foreign  coun¬ 
tries  as  part  of  the  Trusted  IC  program  [17].  This  program  is  technologically  in  its 
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infancy.  Additional  work  is  required  to  verify  IC  authenticity  using  non- destructive, 
non- disruptive  techniques  that  enable  device  verification  during  operation. 

1.2  Technical  Motivation 

Traditional  bit-level  intrusion  detection  and  anti  virus  programs  monitor  activ¬ 
ity  and  assess  system  status  using  information  in  higher  layers  of  the  OS1  model  [106]. 
One  possible  solution  is  to  change  the  focus  of  detection  from  the  Application  (APP) 
and  Network  (NWK)  layers  to  the  Physical  (PHY)  layer.  Detection  of  anomalous 
activity  in  the  Physical  layer  is  dependent  on  the  analysis  of  physical  attributes  of  the 
system  operation,  such  as  power  consumption,  heat,  or  Radio  Frequency  (RF)  radi¬ 
ation  from  a  specific  ICS  device.  Research  efforts  have  proposed  one  such  method  of 
detecting  the  anomalous  behavior  due  to  the  presence  of  trojan  hardware  contained 
in  an  IC  package  using  Side  Channel  Analysis  (SCA)  methods  to  capture  signals  from 
the  outputs  of  the  ICs  [2] .  This  method  could  potentially  be  extended  to  identify  and 
categorize  the  operation  of  a  known  device.  Although  this  method  provided  positive 
results,  it  requires  exercising  all  of  the  expected  operations  in  order  to  effectively 
identify  the  operations.  Additionally,  the  IC  would  need  to  be  isolated  to  minimize 
effects  of  other  components  on  the  same  circuit  board.  Other  research  efforts  have 
used  a  method  of  detecting  anomalous  operations  through  power  analysis  [28,29]. 
Using  a  non-contact  Electro-Magnetic  (EM)-based  instantaneous  current  probe,  the 
operating  current  is  captured  and  used  to  estimate  the  power  usage  of  the  IC.  The 
probe  must  be  placed  at  the  power  trace  for  the  IC  in  question,  which  could  change 
for  different  implementations.  Variations  in  power  usage  based  on  the  variations  in 
IC  manufacture,  device  temperature,  and  other  components  drawing  power  from  the 
same  power  trace  can  complicate  the  measure  of  current  draw  for  the  targeted  IC. 
What  is  needed  is  a  validation  method  that  does  not  require  removal  of  the  IC  and 
can  non-destructively  analyze  the  operations  while  limiting  interference  from  other 
system  components.  Detection  and  classification  of  devices  and  device  operations 


based  on  RF  attributes  and  qualities  have  been  successfully  demonstrated  in  a  large 
body  of  research  [9,11,14-16,18,19,21,23-25,27,34,39,42,49,77,79,81,103],  The 
use  of  attributes  from  the  RF  emissions  provides  a  means  of  detecting  anomalous 
activity  on  a  wide  range  of  systems  without  the  limitations  imposed  by  the  lack  of 
PLC  IDS  and  Anti  Virus  (AV)  program  capabilities.  One  proposed  method  of  veri¬ 
fying  the  identity  of  either  a  communication  node  or  a  hardware  platform  is  through 
collection,  analysis,  and  classification  of  the  RF  energy  emitted  by  the  device.  RF 
Fingerprinting  can  be  used  to  generate  unique  IDs  for  a  given  device  based  on  its 
physical  attributes.  A  key  advantage  to  RF  Fingerprints  is  the  relative  difficulty 
in  spoofing  or  altering  the  RF  Fingerprint  for  a  device  as  compared  to  spoofing  or 
altering  the  network  or  hardware  credentials. 

At  the  core  of  PLCs  (and  most  information  systems  technologies)  are  semi¬ 
conductor  IC  devices.  There  are  potential  variabilities  in  materials,  processes,  and 
environmental  variables  involved  the  semi-conductor  manufacturing  process.  These 
variances  result  in  physical  differences  in  IC  devices  even  if  the  devices  are  designed 
to  be  equivalent.  Device  testing  limits  the  variance  in  devices  sold  to  consumers  by 
testing  to  ensure  the  functional  characteristics  of  the  devices  are  within  a  defined 
tolerance.  Functional  testing  is  a  well  researched  and  developed  field  [1,54,55]. 
Functional  testing  is  generally  limited  to  verifying  the  device  outputs  are  correct 
for  the  expected  clock  timing  and  voltage  levels  variance.  Within  the  tolerance  are 
variances  in  performance  that  can  be  detected  and  quantified  using  specific  test 
equipment.  The  idea  of  Radio  Frequency  Distinct  Native  Attributes  (RF-DNA)  is 
based  on  capturing,  analyzing,  and  quantifying  variance  in  RF  emissions  related  to 
variances  in  manufactured  semi-conductor  devices.  The  term  RF  “fingerprint”  is 
used  to  describe  the  RF-DNA  values  associated  with  a  specific  device. 

RF  fingerprints  can  be  derived  from  two  broad  categories  of  RF  emissions, 
including  Intentional  RF  Emissions  (IRE)  and  Unintentional  RF  Emissions  (URE). 
Substantial  research  has  been  conducted  using  RF  energy  attributes  to  produce 
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RF  fingerprints  for  device  verification  [6,11,35,37,38,57,58,75,80,91,92,97].  IRE 
describes  RF  energy  that  is  intentionally  broadcast  as  part  of  a  device’s  function. 
Examples  of  devices  that  broadcast  IRE  include  wireless  radios,  IEEE  802.15  Blue¬ 
tooth  devices,  cellular  phones,  and  IEEE  802.11  WiFi  networking  devices.  While 
wireless  communication  devices  make  use  of  IRE  to  perform  their  primary  function, 
digital  hardware  devices  also  have  URE  related  to  the  logic  switching  in  the  device. 
URE  describes  RF  energy  that  is  unintentionally  broadcast  during  device  operation. 
The  URE  is  not  beneficial  to  device  operation  and  is  considered  a  detriment  as  it 
can  interfere  with  normal  operations.  The  operation  of  clock  signals  for  IC  circuits 
is  one  contributor  to  the  broadcast  of  URE  from  IC-based  devices. 

RF  fingerprints  have  been  used  to  identify  and  verify  devices.  The  identification 
of  a  device  is  a  means  of  comparing  a  single  RF  fingerprint  to  a  set  of  established 
fingerprints  and  “classifying”  the  device  as  one  of  the  previously  analyzed  devices 
based  on  a  comparison  of  the  RF  fingerprints.  This  is  a  one-to-many  comparison 
problem,  meaning  a  single  fingerprint  is  compared  to  multiple  classified  fingerprints 
in  order  to  properly  identify  the  device.  The  verification  of  a  device  is  a  means 
of  comparing  a  single  RF  fingerprint  to  a  single  previously  captured  and  analyzed 
fingerprint  and  determining  to  what  extent  the  two  fingerprints  are  similar.  This  is  a 
one-to-one  comparison  problem  meaning  a  single  fingerprint  is  compared  to  a  single 
classified  fingerprint  in  order  to  verify  the  device. 

The  following  sections  provide  a  brief  overview  of  previous  efforts  in  the  field 
of  RF  fingerprints  for  the  purpose  of  classification  and  verification  for  both  IRE  and 
URE  RF  signal  responses. 

1.2.1  Emission  Collection.  The  use  of  Physical  layer  RF  characteristics  to 
classify  and  verify  wireless  devices  or  operations  has  been  well  researched  [6,25,36, 
38,58,59,68,92,96,97,102,103].  Regardless  of  emission  type  (IRE  or  URE)  being 
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considered,  RF  fingerprinting  and  device  classification  generally  involves  a  basic  5- 
step  process  that  includes  [79,91]: 

1.  Signal  Collection 

2.  Burst  Detection 

3.  Feature  Extraction 

4.  RF  Fingerprint  Generation 

5.  Device  Classification 

Each  step  of  the  process  is  tailored  to  the  wireless  technology  and  device  char¬ 
acteristics  of  the  Device  Under  Test  (DUT)  as  specified  in  the  device  design  speci¬ 
fications.  The  generic  classification  process  provides  a  starting  point  for  using  RF 
emissions  to  discriminate  between  devices  or  operations. 

The  subject  of  using  Physical  layer  RF  characteristics  to  classify  and  verify 
URE  devices  has  not  been  as  well  researched  as  the  IRE  case.  There  has  been 
research  and  work  focused  on  leveraging  differences  in  output  signals  from  ICs  to 
verify  authenticity  of  the  physical  design,  but  they  do  not  consider  URE  RF  signals 
from  the  device  itself  [2,53].  Recent  research  efforts  provided  some  of  the  initial 
work  in  the  field  of  capturing  the  RF  signals  from  the  IC  DLITs  for  the  purpose  of 
classification  and  verification  [10,11]  differing  from  the  IRE  process  primarily  in  the 
collection  portion  of  the  process. 

The  targeted  RF  signals  for  LIRE  device  exploitation  differ  from  IRE  device 
exploitation  in  that  there  is  no  specified  design  for  the  signal  as  there  are  for  wireless 
broadcast  standards.  Additionally,  the  LIRE  signal  is  not  intentionally  broadcast  and 
so  the  average  signal  power  is  significantly  lower  than  that  of  an  IRE  signal.  The 
signal  is  collected  using  an  RF  probe  instead  of  an  antenna.  The  collection  specifics 
and  configuration  (such  as  bandwidth  and  target  frequency)  are  largely  determined 
by  the  DLIT  clock  frequency  and  empirically  developed  based  on  observation  of 
captured  RF  signals. 
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1.2.2  Fingerprint  Generation.  Once  the  signal  for  an  IRE  or  LIRE  DUT 
has  been  collected,  sampled,  filtered,  and  stored,  the  fingerprint  generation  step  of 
the  process  is  performed.  The  specific  fingerprint  generation  process  considered  is 
that  used  for  AFIT’s  RF-DNA  work  [10,58,79,103].  The  fingerprint  generation  is 
largely  device  agnostic  in  that  regardless  of  whether  the  signal  is  based  on  collection 
against  an  IRE  or  URE  DUT,  the  high-level  methods  used  to  generate  the  fingerprint 
are  identical.  Changes  to  the  process  are  limited  to  configuration  of  the  tools  used 
to  generate  the  fingerprints. 

The  fingerprints  are  based  on  statistical  attributes  of  signal  characteristics  such 
as  amplitude,  frequency,  and/or  phase.  The  statistical  attributes  include  standard 
deviation,  variance,  skewness,  and  kurtosis.  Prior  to  the  calculation  of  the  signal 
characteristics  and  statistical  attributes,  a  variety  of  transforms  can  be  performed 
on  the  collected,  sampled  discrete  signal  dependent  on  the  DUT  signal  qualities. 

1.2.3  Device  Classification.  A  majority  of  existing  RF  fingerprints  research 
involves  the  classification  of  the  DUT  based  on  previously  examined  data  sets.  The 
process  involves  analyzing  RF  fingerprints  for  known  devices.  The  fingerprints  from 
the  known  devices  are  used  to  train  software  known  as  a  classifier.  In  essence,  the 
training  establishes  fingerprint  characteristics  aligning  an  unknown  DUT  fingerprint 
to  a  previously  established  device  class  based  on  the  results  of  the  training  process. 

Classification  of  devices  is  a  one-to-many  comparison  that  typically  leads  to  a 
DUT  being  classified  as  one  of  the  available  known  devices.  The  research  goal  is  to 
verify  a  PLC  is  operating  “normally”.  One  of  the  difficulties  in  classification  is  to 
define  linear  (when  possible)  or  non-linear  boundaries  separating  class  fingerprints 
with  a  certain  degree  of  accuracy.  The  problem  of  verifying  a  PLC  is  operating 
normally  can  be  tackled  as  a  two-class  problem:  normal  operating  condition  class 
or  anomalous  operating  condition  class.  It  is  easier  to  define  a  linear  boundary 
for  a  two-class  problem  than  a  multi-class  problem.  However,  a  more  direct  and 
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simpler  solution  may  be  the  use  of  verification  instead  of  classification  for  the  goal 
of  monitoring  PLCs  for  anomalous  activity. 

1.2.4  Device  ID  Verification.  One  goal  for  capturing  RF  emissions  and 
extracting  RF  fingerprints  is  to  verify  a  device’s  bit-level  ID;  this  is  related  to  device 
verification  which  is  commonly  used  for  granting  network  access.  Verification  is  a 
one-to-one  comparison  of  an  unknown  DUT  fingerprint  to  a  known  device  fingerprint 
with  a  goal  of  determining  if  the  unknown  DUT  is  the  known  device.  This  process 
can  be  compared  to  using  a  photo  identification  card  to  verify  an  individual’s  identity. 

Using  RF  fingerprints  for  DUT  ID  verification  is  not  as  well  researched  as 
using  RF  fingerprints  for  device  classification.  Previous  research  efforts  were  able  to 
demonstrate  the  use  RF  fingerprints  to  verify  PIC  microcontroller  semi-conductor 
devices  [11]  and  wireless  devices  [76].  The  process  of  comparing  RF  fingerprints  to 
verify  a  device’s  ID  parallels  the  procedures  used  for  biometric  human  ID  verification. 
Biometric  classification  and  verification  provides  a  well-established  framework  that 
is  well-suited  to  the  challenge  of  verifying  PLC  operations  [48]. 

Following  the  general  biometric  verification  process  using  RF  fingerprints,  pre¬ 
vious  researchers  were  able  to  accurately  verify  specific  PIC  microcontroller  devices 
with  better  than  99.5%  accuracy  [11].  This  success  highlights  the  potential  appli¬ 
cability  for  using  verification-based  methods  to  assess  PLC  device  operational  state 
(normal  or  anomalous). 

1.2.5  Correlation  and  Matched  Filtering.  Previous  RF  fingerprinting  pro¬ 
cesses  relied  heavily  on  classification  methods.  While  effective,  the  implementation 
of  these  classification  methods  can  become  computationally  expensive  for  a  large 
number  of  classes  and/or  RF  fingerprint  characteristics.  Yet,  work  continues  and 
there  exists  multiple  efforts  aimed  at  quantifying  and  reducing  the  computational 
complexity  of  classification  processes  [3,43].  The  complexity  of  classification  pro- 
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cesses  is  of  concern  when  implementing  the  processes  on  information  systems  with 
limited  processing  power  such  as  mobile  platforms  or  systems  with  power  constraints. 

One  potential  alternative  to  approaching  the  problem  involves  using  relatively 
simple  correlation-based  methods  for  classification.  Correlation  is  a  key  function 
that  is  commonly  used  in  optimal  implementations  of  matched  filtering  for  estimat¬ 
ing  digital  communication  symbols  [72,85].  Additionally,  the  correlation  function 
has  found  use  in  image  processing  and  other  fields  requiring  identification  of  sig¬ 
nals  where  signal  noise  may  be  an  issue  [20].  Correlation  is  conceptually  a  straight 
forward  function  with  a  well-defined  complexity.  Correlation  provides  an  attrac¬ 
tive  alternative  for  classification  given  its  simplicity  and  predictable  computational 
complexity. 

1.3  Research  Contributions 

The  research  goal  involved  expanding  the  knowledge  base  of  Physical  layer 
methods  being  developed  to  reliably  detect  anomalous  and/or  malicious  activity 
within  ICS  components.  Specifically,  the  research  objectives  included  developing 
a  general  verification-based  anomaly  detection  approach  to  support  both  1)  soft¬ 
ware  anomaly  detection- discriminating  between  various  operating  conditions  to  de¬ 
tect  malfunctioning  or  malicious  software,  firmware,  etc.,  and  2)  hardware  component 
discrimination- discriminating  between  various  hardware  components  to  detect  mal¬ 
functioning  or  counterfeit,  trojan,  etc.  ICs.  As  summarized  in  Table  1.1,  AFIT 
research  contributions  in  the  Radio  Frequency  Intelligence  (RFINT)  held  have  been 
made  in  several  technical  areas.  Previously  undefined  acronyms  that  are  used  in 
the  table  include:  Time  Domain  (TD),  Spectral  Domain  (SD),  Correlation  Domain 
(CD),  Multiple  Discriminant  Analysis/Maximum  Likelihood  (MDA/ML),  General¬ 
ized  Relevance  Learning  Vector  Quantized-Improved  (GRLVQI),  and  Learning  From 
Signals  (LFS). 
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1-4  Document  Organization 

The  remaining  chapters  are  organized  as  follows.  Chapter  2  provides  back¬ 
ground  information  regarding  SCADA  and  ICS  systems,  PLCs,  Ladder  Logic  Pro¬ 
grams  (LLPs),  network/PLC  vulnerabilities,  spurious  RF  signal  collection,  post- 
collection  processing,  the  correlation  operation,  the  Hilbert  transform,  and  device 
verification.  Chapter  3  provides  details  on  the  methodology  used  for  this  research 
effort  including  signal  collection  and  processing,  the  CBAD  process,  the  RF-DNA 
process,  the  specific  devices  and  LLPs  used  for  this  research,  and  the  verification 
metrics  presented  to  measure  performance.  Chapter  4  presents  the  results  of  the 
methodologies  from  Chapter  3  including  verification  performance  for  CBAD  and 
RF-DNA  processes  for  (TD)  and  Hilbert  transformed  waveforms.  Chapter  5  pro¬ 
vides  a  summary  of  the  research  results  and  potential  future  research  efforts. 
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Table  1.1:  Relational  mapping  between  RFINT  Technical  Areas  in  Previous  related 
work  and  Current  AFIT  research  contributions.  The  x  symbol  denotes  specific  areas 
addressed. 


Technical  Area  Previous  Work  Current  Research 


Addressed 

Ref# 

Addressed 

Ref# 

TD  Features 

X 

[57,58,76,77] 

[91,92,102,103] 

X 

[86-89] 

SD  Features 

X 

[10,11,81,103] 

CD  Features 

X 

[91,92] 

X 

[86-89] 

Emission  Type 


Intentional  (IRE) 

X 

[57,58,76,81] 

[91,92,102,103] 

[21,39,40,42] 

Unintentional  (URE) 

X 

[9-11] 

X 

[86-89] 

Burst 

X 

[57,58,76,81] 

[91,92,102,103] 

[21,39,40,42] 

Continuous 

X 

[9-11] 

X 

[86-89] 

High  SNR 

X 

[57,58,76,81] 

[91,92,102,103] 

[21,39,40,42] 

Low  SNR 

X 

[9-11] 

X 

[86-89] 

Classification/ Verification  Processes 


MDA/ML 

X 

[57,58,77,81] 

[91,92,102,103] 

[9-11,21] 

GRLVQI 

X 

[57,58,77,81] 

X 

LFS 

X 

[39-42] 

Dimensional  Reduction  Analysis  (DRA) 


MDA/ML 

X 

[39,57,58,77,81] 

GRLVQI 

X 

[56,77,81] 

X 

LFS 

X 

[39-42] 

Verification 


Electronic  Components 

X 

[9-11] 

X 

[89] 

Authorized  Wireless  Devices 

X 

[21,77,81] 

Rogue  Wireless  Devices 

X 

[21,77,81] 

Device  Operations 

X 

[86-88] 
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2.  Background 

This  chapter  provides  background  information  on  the  topics  associated  with  the 
research  in  support  of  developing  a  single  verification-based  anomaly  detection  ap¬ 
proach  supporting:  1)  software  anomaly  detection- discriminating  between  various 
operating  conditions  to  detect  malfunctioning  or  malicious  software,  firmware,  etc., 
and  2)  hardware  component  discrimination- discriminating  between  various  hardware 
components  to  detect  malfunctioning  or  counterfeit,  trojan,  etc.  Integrated  Circuits 
(ICs).  For  the  purpose  of  the  research,  verification  is  the  validation  of  a  claimed 
identity  for  either  an  operating  condition  or  hardware  component. 

Section  2.1  provides  background  on  Supervisory  Control  and  Data  Acquisition 
(SCADA)  and  Industrial  Control  System  (ICS)  and  outlines  the  relationship  be¬ 
tween  Programmable  Logic  Control  (PLC)  devices  and  the  Ladder  Logic  Programs 
(LLPs)  used  to  control  them.  Section  2.2  provides  background  related  to  SCADA 
and  ICS  vulnerabilities.  Section  2.3  outlines  the  general  emission  collection  and 
post-collection  processing  used  for  Time  Domain  (TD)  signals.  Section  2.4  provides 
a  description  of  the  two  primary  signal  processing  methods  implemented  in  the  re¬ 
search,  including  correlation  in  Sect.  2.4.1  and  the  Hilbert  transform  in  Sect.  2.4.2. 
The  chapter  concludes  with  Sect.  2.5  which  provides  details  on  the  verification-based 
anomaly  detection  process  and  Receiver  Operating  Characteristic  (ROC)  curve  met¬ 
rics  used  for  quantifying  verification  process  performance. 

2.1  SCADA  and  ICS  Applications 

As  used  in  this  document,  the  term  SCADA  refers  to  the  entire  collection 
of  hardware,  software,  and  network  elements  that  directly  support  monitoring  and 
control  of  ICS  functions  and  facilities.  ICS  functions  and  facilities  include,  but 
are  not  limited  to,  manufacturing,  power  generation,  waste- water  treatment,  and 
transportation  control.  SCADA  systems  are  constructed  in  a  hierarchical  manner 
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with  supervisory  systems  providing  monitoring  and  top-down  control  of  field  devices, 
such  as  PLCs  and  Remote  Terminal  Units  (RTUs).  Field  devices  are  used  to  collect 
telemetry,  which  may  be  used  to  control  held  device  operations  or  transmitted  to  a 
Human  Machine  Interface  (HMI)  for  observation  and  recording.  PLC  functionality 
is  controlled  through  LLPs  which  are  computer  programs  written  in  a  PLC  specific 
programming  environment.  While  HMIs  play  an  important  part  in  overall  SCADA 
functionality  and  operation,  PLCs  and  their  controlling  LLPs  were  the  focus  of  this 
research  effort. 

2.1.1  Programmable  Logic  Controller  (PLC).  PLC  devices  are  used  to  im¬ 
plement  low-level  functions  within  a  SCADA  system.  At  the  simplest  level,  PLCs 
collect  various  sensor  inputs,  run  LLP  operations  using  the  input  values,  and  as¬ 
sign  outputs  based  on  the  program  results.  A  PLC  device  is  typically  comprised  of 
a  microprocessor/microcontroller,  associated  Random  Access  Memory  (RAM)  and 
firmware  for  executing  the  LLPs,  input  connections  for  collecting  sensor  data,  out¬ 
put  connections  for  controlling  physical  electro- mechanical  devices  (relays,  valves, 
motors,  etc.),  and  communication  connections  for  interfacing  with  other  devices  or 
for  direct  human  interaction. 

Relative  to  current  main-stream  Information  Technology  (IT)  products,  the 
microprocessor,  RAM,  and  firmware  used  in  a  majority  of  currently  deployed  PLCs 
are  outdated  and  lagging  in  performance.  Thus,  they  are  not  capable  of  execut¬ 
ing  programs  such  Host-based  Intrusion  Detection  System  (HIDS)  software  that  is 
commonly  used  in  IT  applications  to  provide  internal  defense  against  malicious  or 
unauthorized  programs.  PLCs  are  installed  within  a  variety  of  physical  environments 
that  prioritize  robustness  over  computational  capability  and  require  relatively  simple 
hardware  with  demonstrated  reliability  and  resilience  to  harsh  environmental  effects. 
The  unique  hardware  of  PLCs  necessitates  specially  designed  Operating  System  (OS) 
software  for  interfacing  between  the  hardware  and  the  user-implemented  LLPs.  The 
defensive  security  programs  and  processes  that  are  commonly  implemented  in  tradi- 
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tional  IT  systems  are  not  applicable  to  PLCs  due  to  the  hardware  differences  between 
traditional  IT  systems  and  PLCs  [82],  Thus,  SCADA  held  devices  do  not  benefit 
from  a  large  body  of  research  and  technologies  aimed  at  improving  IT  security. 

2.1.2  Ladder  Logic  Program  (LLP).  LLP  implementation  allows  users  to 
control  the  processing  of  PLC  inputs  and  outputs.  The  LLP  language  is  unique  to  the 
PLC/SCADA  environment  and  is  largely  based  on  the  physical  design  of  relays  that 
were  used  prior  to  introducing  PLCs  to  control  ICS  functionality.  Figure  2.1  shows 
an  example  of  an  LLP  as  programmed  in  the  Allen  Bradley  RSLogix®  programming 
environment  consisting  of  Move  (MOV)  and  Square  Root  (SQR)  LLP  operations.  As 
presented,  these  programs  are  structured  as  inputs  on  the  left  and  operations/outputs 
on  the  right.  Apart  from  branching  capability  that  is  inherently  supported  in  PLCs, 
the  LLPs  fundamentally  operate  in  sequential  order.  For  the  PLC  devices  considered 
under  this  research,  the  execution  of  experimental  LLPs  was  strictly  sequential  with 
no  recursive  calls  or  nested  function  included;  this  ensured  that  PLC  operation  was 
deterministic  and  that  resultant  research  conclusions  were  based  on  experimentally 
repeatable  execution. 


End  of  Scan 


Figure  2.1:  Representative  LLP  constructed  in  the  Allen  Bradley  RSLogix®  pro¬ 
gramming  environment  consisting  of  a  single  MOV  and  SQR  operation.  Program 
rungs  in  the  ladder  are  executed  sequentially  from  left-to- right,  top-to-bottom. 
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LLP  programs  are  inherently  repetitive,  with  each  repeated  execution  cycle 
beginning  at  the  top  LLP  rung  and  ending  at  the  last  program  rung.  Each  execution 
of  the  LLP  is  called  a  scan.  For  each  LLP  scan,  the  inputs  are  all  processed  and  read 
into  memory  first.  The  outputs  are  then  logically  computed  and  stored  in  internal 
registers  based  on  the  logic  of  the  LLP.  The  final  step  of  the  LLP  scan  is  to  assign 
all  computed  outputs  to  the  actual,  physical  outputs  of  the  device.  The  PLC  then 
begins  the  next  scan  at  the  top  rung  of  the  LLP.  While  branches  may  exist  in  the 
LLP,  recursive  calls  or  do-while  loops  are  not  permitted  as  the  scan  is  fundamentally 
a  linear  left-to- right,  top-to-bottom,  progression  through  the  LLP. 

2.1.3  Human  Machine  Interface  (HMI).  HMIs  provide  a  means  for  users  to 
observe,  monitor,  and  control  ICS  functions.  HMIs  are  fundamentally  software  pack¬ 
ages  installed  and  executed  on  standard  Personal  Computers  (PCs).  HMI  software  is 
programmed  to  interact  with  PLCs  and  other  field  devices  through  IT  network  media 
using  SCADA  communication  protocols  such  as  the  Object  Linking  and  Embedding 
(OLE)  process  control.  While  PLCs  represent  one  means  through  which  malicious 
events  can  be  physically  enacted  upon  the  system,  the  HMIs  represent  an  easy  means 
for  malicious  software  to  be  loaded  onto  victim  PLCs  [12,13,93,94],  For  example, 
attacks  similar  to  Stuxnet  obfuscate  the  operator’s  view  of  the  victim  PLC  status 
by  using  altered  LLPs  that  replace  legitimate  LLPs  stored  on  the  PLC  [105].  De¬ 
spite  the  vulnerability,  HMIs  remain  essential  and  provide  valuable  insight  into  ICS 
facility  functions  by  assimilating  data  from  various  Remote  Terminal  LInits  (RTUs) 
and  PLCs  and  presenting  it  in  a  customizable  format  to  ICS  facility  operators. 

While  HMIs  pose  a  potential  vulnerability,  the  fact  that  a  majority  of  the  HMIs 
are  built  from  standard  IT  components  significantly  mitigates  the  threat.  Widely 
available  IT  security  programs,  tools,  and  methods  are  applicable  to  the  majority  of 
HMI  systems.  Additionally,  the  communication  networks  consist  of  Commercial  Off 
The  Shelf  (COTS)  IT  devices  and  protocols.  There  may  be  proprietary  protocols 
operating  between  field  devices,  but  these  protocols  typically  operate  over  standard 
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IT  infrastructure  and  can  be  monitored  for  potentially  destructive  activity  similar 
to  how  standard  protocols  are  monitored  [64], 

2.2  SCAD  A  and  ICS  Vulnerabilities 

SCADA  and  ICS  systems  remain  vulnerable  to  a  variety  of  attacks  and  methods 
of  compromise.  The  ICS  Cyber  Emergency  Response  Team  (ICS-CERT)  maintains  a 
list  of  hundreds  of  vulnerabilities  specifically  affecting  SCADA  and  ICS  components 
and  systems  [45].  The  vulnerabilities  affect  multiple  devices  within  the  SCADA 
hierarchy,  including  both  PLCs  and  HMIs.  HMI  hardware  implementations  are 
largely  based  on  traditional  IT  systems  and  benefit  from  the  substantial  research  in 
IT  network  defense.  HIDS  programs  offer  a  means  to  detect  unauthorized  programs 
on  HMI  systems  and  Anti  Virus  (AV)  software  provides  another  avenue  for  detecting 
and  removing  malicious  code  on  traditional  IT  systems.  As  stated  in  Sect.  2.1.1, 
PLCs  are  largely  vulnerable  in  this  respect  due  to  a  lack  of  defensive  software  or 
methods  inherent  in  their  design.  Malicious  threats  such  as  Stuxnet  exploit  this 
vulnerability  to  inject  malicious  unknown  code  into  SCADA  systems. 

The  PLC  field  devices  pose  a  particularly  alarming  threat  due  to  the  lack  of 
general  purpose  processing  capability,  proprietary  nature  of  the  devices,  and  the 
long  tech- refresh  cycles  for  installed  devices.  The  LLPs  introduced  in  Sect.  2.1.2 
are  focused  on  providing  industrial  monitoring  and  control  functions  and  are  not 
general  purpose  enough  to  effectively  implement  AV  or  IDS  functions,  leaving  the 
PLC  devices  vulnerable. 

Extensive  research  effort  has  been  applied  to  secure  traditional  Information 
Technology  (IT)  systems  and  networks  by  controlling  access  and  detecting  malicious 
programs,  or  malware,  in  the  higher  layers  of  the  network  Open  Systems  Intercon¬ 
nect  (OSI)  model,  i.e.,  the  Data  Link  Layer  (DLL)  through  the  Application  (APP) 
layer.  Bit-level  credentials,  such  as  Media  Access  Control  (MAC)  addresses  and 
International  Mobile  Equipment  Identity  (IMEI)  numbers  control,  control  network 
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access  while  AV  and  Intrusion  Detection  System  (IDS)  software  protects  IT  systems 
from  malware.  Had  the  IT  protection  methods  been  available  for  PLCs  at  the  time 
of  attack,  the  adverse  effects  of  Stnxnet  [105]  and  Duqu  [93]  malware-based  attacks 
may  have  been  mitigated.  The  programs  offering  defense  of  commodity  IT  assets 
are  generally  not  implementable  on  the  majority  of  PLCs  and  other  ICS  components 
within  SCADA  systems;  therefore,  the  goal  is  to  find  alternative  defense  mechanisms 
that  can  be  implemented  on  PLCs  and  other  ICS  components  that  are  currently  vul¬ 
nerable.  One  specific  alternative  has  emerged  that  exploits  Radio  Frequency  (RF) 
emissions  to  achieve  human-like  discrimination  of  hardware  devices  using  PHY  Layer 
information  extracted  from  either  unintentional  or  intentional  RF  emissions  to  aug¬ 
ment  bit-level  network  access  control  measures  [11,18,25,58,79,102], 

2.3  RF  Emission  Collection 

Signal  collection  is  the  process  of  capturing  and  storing  the  Device  Under  Test 
(DUT)  signal.  The  signal  collection  can  either  be  accomplished  using  an  antenna  for 
Intentional  RF  Emissions  (IRE)  exploitation  or  using  an  EM  probe  for  Unintentional 
Radiated  Emanation  (LIRE)  exploitation.  In  either  case,  the  equipment  and  process 
must  be  tailored  to  the  specific  target  and  signal  attributes  desired.  Previous  RF- 
based  research  can  be  broadly  categorized  based  on  the  type  of  emission(s)  exploited. 
IRE  energy  comes  from  devices  that  intentionally  broadcast  or  emit  RF  radiation  in 
support  of  their  primary  “by  design”  function  (e.g.,  cellular  phones,  pagers,  wireless 
Local  Area  Network  (LAN)  adapters,  etc.  LIRE  energy  comes  from  devices  that  emit 
RF  radiation  as  an  unintended  by-product  or  “side-effect”  of  their  primary  function; 
the  majority  of  electrical  Integrated  Circuit  (IC)  components  emit  some  amount  of 
RF  radiation  during  the  course  of  normal  operation. 

IRE  RF  signals  are  broadcast  at  a  carrier  frequency  specified  by  the  specific 
design  of  the  DLIT  broadcast  technology.  The  carrier  frequency  is  typically  much 
higher  than  the  frequency  bandwidth  of  the  DLIT  RF  signal  [85].  For  example  IEEE 
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802.15  Bluetooth  communications,  use  the  Industrial,  Scientific,  and  Medical  (ISM 
RF  band  residing  at  a  carrier  frequency  of  fc— 2.4  GHz,  but  the  Bluetooth  signal  has 
a  bandwidth  of  less  than  500  MHz  [47]).  Signals  sampled  at  the  carrier  frequency 
of  fc— 2.4  GHz  would  require  substantial  storage  and  bandwidth  if  sampled  and 
stored  at  the  Nyquist-Shannon  criteria  (sample  frequency  /s>2/c).  Therefore,  the 
signal  is  down-converted  after  initial  capture  to  a  lower  frequency  with  lower  storage 
requirements.  This  down-converted  signal  meets  the  Nyquist-Shannon  criteria  for 
the  DUT  RF  signal  adjusted  to  account  for  the  carrier  frequency. 

For  a  collection  against  an  IRE  device,  critical  collection  aspects  include  the 
antenna  used  to  receive  the  signal  and  the  receiver  used  to  format  and  store  the 
signal.  Additionally,  even  with  the  same  receiver,  different  signal  technologies  will 
require  different  collection  settings  such  as  center  frequency,  filter  bandwidth,  sam¬ 
pling  rate,  and  signal  gain.  Storage  and  processing  also  require  specific  collection 
methods  and  settings.  Environmental  conditions  also  have  an  effect  on  the  collec¬ 
tion  as  temperature  and  RF  interference  both  impact  captured  signal.  Previous  RF 
Fingerprinting  efforts  have  taken  steps  to  limit  the  environmental  impact  such  as 
operating  the  collection  receiver  from  within  a  climate  controlled  automobile  [25] 
or  performing  the  collections  indoors  [79].  Although  an  uncontrolled  environment 
presents  a  challenge  to  collecting  IRE  signals,  other  researchers  have  had  success  in 
collection  and  fingerprinting  in  an  operational  test  environment  with  limited  or  no 
control  over  potential  interference  or  environmental  effects  [34,102], 

Collecting  against  URE  parallels  the  collection  against  IRE  devices,  but  re¬ 
quires  different  equipment  and  collection  methods.  There  exists  invasive  and  non- 
invasive  techniques  for  RF  capture  from  URE  devices.  Given  the  goal  of  RF  finger¬ 
printing,  to  accurately  identify  hardware  or  operations,  most  collections  will  require 
non-invasive  collections.  One  method  of  non-invasively  collecting  a  URE  signal  is 
by  using  an  Electro  Magnetic  (EM)  probe.  Previous  related  research  has  exploited 
differences  in  electrical  responses  collected  directly  from  various  IC  connecting  pins 


23 


(power,  timing,  control,  data,  etc.)  to  verify  physical  design  authenticity  [2,28]; 
physical  contact  assessment.  These  methods  are  unlike  RF-based  method  adopted 
for  this  research  whereby  RF  emissions  are  collected  from  operating  ICs  using  a  near- 
held  probe  placed  in  close  proximity  to  the  DUT  [11,52];  non-contact  electromagnetic 
assessment. 

The  RF  signal  must  still  be  sampled  for  the  purpose  of  storing  and  analyzing. 
The  signal  characteristics  are  based  on  physical  attributes  of  the  DUT  and  not  spec¬ 
ified  design,  so  the  sampling  rate  is  determined  empirically  by  analyzing  the  signals 
collected  from  the  DUT.  As  in  the  case  of  IRE  collection,  the  sampled  signal  may 
be  filtered  by  a  Low  Pass  Filter  (LPF)  to  increase  the  overall  Signal-to-Noise  Ratio 
(SNR).  Because  the  URE  broadcast  is  not  based  on  a  specific  design,  the  values  for 
the  LPF  are  determined  empirically  based  on  the  spectral  characteristics  of  the  DLIT 
LIRE  RF  signals.  Another  difference  between  the  LIRE  and  IRE  collection  process  is 
the  burst  detection.  The  IRE  signals  considered  are  transmitted  in  bursts  containing 
well  documented  regions.  The  URE  signals  construction  is  an  artifact  of  the  oper¬ 
ational  design  and  it  does  not  follow  any  structured  arrangement  or  organization. 
Therefore,  much  of  the  time  range  for  the  Region  Of  Interest  (ROI)  is  determined 
empirically  based  on  visual  examination  of  collected  signals. 

As  mentioned  previously,  the  equipment  used  to  collect  RF  signals  in  support 
of  fingerprinting  differ  for  specific  collections.  For  example,  the  signal  specific  equip¬ 
ment  would  include  tuned  antennas  for  IRE  or  EM  detection  equipment  such  as  a 
near  held  probe  for  LIRE  collection.  In  addition  to  the  equipment  required  to  capture 
the  RF  signals  on  the  physical  layer,  the  signal  data  must  be  collected  and  stored 
for  future  processing. 

The  LIRE  fingerprinting  research  held  has  less  subject  matter  and  research 
than  the  IRE  research  held.  There  have  been  no  less  than  four  AFIT  research 
efforts  in  the  held  of  IRE  related  to  GSM,  802. llx,  and  802.16  technologies.  To  the 
best  of  the  author’s  knowledge  there  has  only  been  one  dedicated  AFIT  research 
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effort  related  to  the  field  of  URE  fingerprinting.  Therefore,  while  the  RF  Signal 
Intercept  and  Collection  System  (RFSICS)  collection  metrics  referenced  above  span 
multiple  research  efforts  and  wireless  technologies,  the  AFIT-specihc  metrics  for 
URE  Fingerprinting  are  obtained  from  the  previous  PIC  microcontroller  work  [10]. 
For  the  collection  of  URE  data,  recent  AFIT  efforts  have  used  a  Riscure  near-held 
probe  in  place  of  an  antenna  for  IRE  and  capture/store  the  data  using  a  LeCroy 
104-Xi-A  Oscilloscope.  A  filter  was  implemented  between  the  near-held  probe  and 
the  oscilloscope  to  filter  signals  greater  than  1  GHz.  Since  URE  devices  do  not 
intentionally  broadcast  an  RF  signal  at  an  advertised  frequency,  as  is  the  case  with 
IRE  devices,  the  data  collection  settings  are  based  on  clock  cycles  and  empirical 
results.  The  target  PIC  devices  operate  at  a  clock  rate  of  29.48  MHz.  However,  the 
collection  was  performed  at  a  sampling  rate  of  2.5  Gsps  (satisfying  Nyquist  sampling 
criteria  for  signals  less  than  1.25  GHz)  in  order  to  allow  post-collection  simulations 
using  the  extra  data  [10]. 

2.3.1  Near-Field  RF  Probe.  RF  energy  can  be  collected  using  a  typical 
far-held  antenna  (common  for  IRE  collections)  or  in  the  near-held  using  a  specihc 
variant  called  an  RF  probe  (common  for  URE  collections).  The  near-held  RF  probes 
used  for  this  research  were  manufactured  by  Riscure  and  composed  of  a  tuned,  cal¬ 
ibrated  conductive  coil  and  low-noise  amplifier.  RF  probe  performance  is  primarily 
characterized  by  its  bandwidth  and  spatial  resolution,  where  bandwidth  represents  the 
frequency  range  over  which  the  probe  is  sufficiently  sensitive  to  collect  RF  emissions 
of  interest  and  spatial  resolution  is  the  physical  area  extent  over  which  the  probe 
maintains  this  sensitivity. 

2.3.2  Digital  Sampling.  A  continuous,  real- world  signal  contains  an  infinite 
number  of  values  between  any  two  points  in  time  and  would  require  an  infinite 
amount  of  storage  and  processing  power  to  analyze.  Therefore,  the  continuous  RF 
signal  broadcast  from  the  DUT  must  be  sampled  for  storage  and  analysis.  Sampling 


25 


involves  converting  the  continuous  signal  to  a  discrete  representation  for  storage  and 
analysis.  The  collection  hardware  is  configured  to  sample  the  RF  frequencies  at  a 
sampling  rate  meeting  the  requirements  of  the  Nyquist-Shannon  theorem  [67]. 

Following  near-held  probe  collection,  the  analog  emission  responses  are  digi¬ 
tally  sampled  for  subsequent  storage  and  post-collection  processing.  Under  Nyquist 
criteria,  the  collected  analog  response  must  be  sampled  at  a  rate  of  /s>2x/^,  where 
fM  is  the  maximum  frequency  extent  of  the  RF  response.  For  emissions  collected 
here,  the  maximum  frequency  extent  was  limited  by  placing  an  in-line  RF  filter  be¬ 
tween  the  Riscure  near-held  probe  and  the  LeCroy  oscilloscope  (o’scope)  used  as  the 
receiver.  The  RF  hlter  bandwidth  is  determined  by  the  spectral  points  at  which  the 
signal’s  power  (S)  is  attenuated  by  <SU<3.0dB. 

In  addition  to  sampling  frequency  fs ,  another  critical  aspect  in  the  sampling 
process  is  quantization  of  TD  signal  samples.  Quantization  involves  mapping  a  con¬ 
tinuous  analog  variable  (collected  RF  emission)  into  a  discrete  digital  variable.  The 
voltage  range  and  bit-depth  define  the  analog-to-digital  mapping  process.  For  exam¬ 
ple,  an  input  voltage  range  of  Ue[0,  2.55]  V  gets  mapped  to  an  8-bit  digital  variable 
and  provides  the  ability  to  discern  between  28=256  total  discrete  voltage  levels  in 
quantization  increments  of  q= 2.55/(256  —  1)=0.01  V.  The  resultant  mapping  of  a 
continuous  input  voltage  to  a  discrete  variable  inherently  introduces  quantization 
error  into  the  digitized  sample  values.  The  adverse  effects  of  quantization  error  vary 
with  application  and  efforts  using  identical  equipment  as  this  research  successfully 
discriminated  between  hardware  devices  while  experiencing  no  adverse  quantization 
effects  [9-11].  Given  this  motivation,  the  effect  of  quantization  error  was  not  ad¬ 
dressed  or  analyzed  under  this  research.  The  specific  bit-depth  and  sampling  rate 
implemented  under  this  research  are  discussed  in  further  detail  in  Chapter  3. 
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2-4  Post- Collection  Processing 

The  primary  post-collection  processing  methods  used  for  this  research  were 
based  on  Correlation  and  the  Hilbert  transform.  Collectively,  these  methods  are  the 
basis  of  the  Correlation-Based  Anomaly  Detection  (CBAD)  process  that  is  intro¬ 
duced  under  this  research  and  serves  as  the  core  signal  processing  engine. 

2-4-1  Correlation.  The  correlation  processing  used  here  extends  beyond 
traditional  digital  communication  system  applications  and  is  more  consistent  with 
what  is  commonly  used  in  image  processing  and  other  fields  requiring  signal  identifi¬ 
cation  in  noisy  environments  [20] .  Given  two  discrete  complex- valued  sequences  x  [n] 
and  y[n],  the  kth- lag  element  of  the  auto-correlation  (Rxx[k])  and  cross-correlation 
(Rxy[k])  sequences  are  given  by, 

Rxx[k\  =  'y  (  xnxn_k  ,  (2.1) 

n 

RXy[k\  =  J2xnyn-k,  (2-2) 

n 

respectively,  where  *  denotes  the  complex  conjugate.  From  an  a-posterior  proba¬ 
bility  perspective,  classification  and  verification  are  related  processes  that  can  be 
independently  implemented  [11].  However,  existing  classification  processes  require 
considerable  resources  for  a  large  number  of  classes  and/or  class  features.  Con¬ 
siderable  work  has  been  dedicated  to  quantifying  and  reducing  the  computational 
complexity  of  such  processes  [3,43].  Still,  concern  remains  for  implementation  using 
systems  having  limited  or  modest  computing  capability.  Correlation-based  methods 
are  a  less  computationally  intensive  alternative  for  addressing  these  concerns  and  the 
foundation  of  optimal  matched  filtering  applications,  with  one  prevalent  implemen¬ 
tation  being  the  estimation  of  digital  communication  symbols  [72],  Classification 
processes  vary  greatly  in  the  execution  cost,  but  the  correlation  process  operational 
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cost  is  predictable  and  well  bounded.  The  operational  performance  using  two  discrete 
sequences,  say  x[n]  and  y[n]  of  length  N,  is  computable  and  analytically  bounded 
by 


0(RXY[x[n],y[n]])~0(N2).  (2.3) 

where  O(-)  denotes  the  computational  time  complexity. 

2-4-2  Hilbert  Transform.  The  Hilbert  Transform  (HT)  is  commonly  used 
in  audio  signal  processing  applications  to  stabilize  signal  amplitude  (envelope)  esti¬ 
mation  [31,71].  The  HT  of  continuous  signal  xs(t)  is  given  by  [30,32] 

11  f°°  t(t) 

H{t )  =  x8(t)  ®-  =  -P.V.  /  —-1  —  dr  ,  (2.4) 

nt  7T  J-oot-T 

where  ©  denotes  convolution  and  P.V.  denotes  the  Cauchy  principal  value.  Now 
letting  x[n]  be  a  periodic  sequence  of  N  consecutive  time  samples  of  xs(t),  elements 
of  the  Discrete  Hilbert  Transform  (DHT)  are  given  by  [50] 

H[n\  =  ^2  Xs{k)cot  (n  -  k)  ;  n  Even  ,  (2.5) 

k  Odd 

H[n ]  =  ^  xs{k)cot  ^  j  (n  -  k)  ;  n  Odd  .  (2.6) 

k  Even 

Of  importance  to  this  research  is  that  the  near-held  probe  and  o’scope  collection 
process  described  in  Sect.  2.3.2  yields  real-valued  samples  of  the  collected  emis¬ 
sion.  Thus,  the  DHT  process  in  (2.5)  and  (2.6)  is  readily  implemented  using  the 
MATLAB®  hilbert  function.  Strictly  speaking,  the  MATLAB®  hilbert  func¬ 
tion  returns  a  complex  analytic  signal  representation  with  the  real  In-phase  (I) 
components  being  the  original  input  sequence  and  the  imaginary  Quadrature  (Q) 
components  being  the  input  sequence  with  a  90°  phase  shift  [62],  The  imaginary 


Quadrature  components  represent  the  results  of  performing  the  Hilbert  transform  of 
the  original  real  sequence.  The  corresponding  instantaneous  amplitude  response  of 
the  real-valued  input  signal  is  simply  found  by  taking  the  magnitude  of  each  complex 
1-Q  pair  and  has  the  same  length  as  the  original  sampled  response. 

2.5  Verification- Based  Discrimination 

The  verification-based  discrimination  process  used  for  this  research  is  consistent 
with  the  methodology  used  for  biometric  identity  verification  [48].  As  implemented 
here,  the  one-to-one  verification  process  includes  a  comparison  the  DUT’s  current 
unknown  state  (as  captured  in  a  current  RF  fingerprint)  with  a  stored  reference 
fingerprint  from  the  same  device  operating  in  a  known  state.  This  process  and  fin¬ 
gerprints  from  untransformed  time  domain  URE  signals  had  been  previously  used  to 
verify  PIC  micro-controller  operation  (software  discrimination)  and  to  discriminate 
between  PIC  micro-controller  ICs  (hardware  discrimination)  [11],  The  process  in 
these  earlier  works  was  adopted  here  to  support  an  anomalous  vs.  normal  assess¬ 
ment  methodology.  In  this  case,  an  anomaly  is  any  type  of  response,  behavior,  etc., 
that  is  not  deemed  normal  and  which  may  occur  as  a  result  of  hardware  and/or  soft¬ 
ware  failure,  degradation,  or  modification;  the  focus  here  was  on  detecting  software 
anomalies  through  verification  of  the  operating  condition  response. 

By  implementing  the  general  biometric  verification  process  in  support  of  hard¬ 
ware  anomaly  detection,  PIC  micro-controller  identities  have  been  verified  to  better 
than  99.5%  accuracy  [11].  These  previous  results  increased  the  envisioned  proba¬ 
bility  of  success  for  the  proposed  anomalous  vs.  normal  assessment  methodology 
described  in  Chapter  3  using  more  complicated  PLC-based  SCADA  device  operations 
with  a  goal  toward  determining  the  DUTs  current  operational  state.  The  verification 
process  is  implemented  by  presenting  all  current  observations  as  normal  operation 
regardless  of  the  actual  (unknown)  operation  and  making  a  final  declaration  of  nor¬ 
mal  or  anomalous.  Relative  to  possible  verification  outcomes  in  other  verification 
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and  detection  work  [11,48,83],  Table  2.1  shows  there  are  four  possible  outcomes  from 
the  normal  vs.  anomalous  declaration  process.  In  the  context  of  successful  Anomaly 
Detection,  the  True  Anomaly  Detection  outcome  represents  success. 

Table  2.1:  Normal  vs.  Anomalous  Verification  Outcomes:  A  device’s  current  oper¬ 
ational  state  is  assessed  by  claiming  Normal  and  making  a  final  declaration  based 
on  operational  credential  analysis  with  a  goal  of  achieving  reliable  True  Anomaly 
Detection. 


Actual 

Claimed 

Declared 

Outcome 

Normal 

Normal 

Normal 

True  Normal  Verification 

Normal 

Normal 

Anomaly 

False  Anomaly  Detection 

Anomaly 

Normal 

Normal 

False  Normal  Verification 

Anomaly 

Normal 

Anomaly 

True  Anomaly  Detection 

2.5.1  ROC  Performance  Assessment.  Quantitative  performance  assess¬ 
ment  of  the  verification-based  anomalous  vs.  normal  assessment  is  based  on  ROC 
curve  analysis  as  commonly  used  for  binary  classification  problems  such  as  biometric 
verification  [11,48].  In  this  case,  verification  threshold  ty  is  set  based  on  training 
and  used  to  declare  (rightly  or  wrongly)  that  the  current  operating  condition  is  nor¬ 
mal  (verification)  or  anomalous  (detection).  For  assessment  outcomes  in  Table  2.1, 
ROC  curves  are  generated  by  varying  ty  over  its  valid  range  and  recording  the  True 
Anomaly  Detection  Rate  (TADR)  (anomalous  conditions  correctly  declared  anoma¬ 
lous)  and  the  False  Anomaly  Detection  Rate  (FADR)  (normal  conditions  incorrectly 
declared  anomalous)  for  each  variation  in  ty.  The  resultant  ROC  curve  is  plotted  as 
TADR  versus  FADR  as  threshold  ty  varies.  The  Equal  Error  Rate  (EER)  point  is 
the  point  on  the  ROC  curve  at  which  F AD R=  1  —  T AD R=FNV R  (False  Normal 
Verification  Rate).  The  EER  provides  a  single  metric  for  comparing  two  detection 
methods,  with  a  lower  EER  indicating  a  more  effective  detection  method. 
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3.  Methodology 

This  chapter  provides  details  on  the  methodology  implemented  to  conduct  the  re¬ 
search  and  generate  results  presented  in  Chapter  4.  The  Correlation  Based  Anomaly 
Detection  (CBAD)  process  is  used  to  detect  anomalous  Programmable  Logic  Con¬ 
troller  (PLC)  operating  conditions,  with  a  goal  of  reliably  differentiating  between 
desired  normal  (Norm)  and  undesired  anomalous  ( Anom )  operating  conditions.  In 
an  operational  environment  an  anomalous  operating  condition  could  be  triggered  by 
software  and/or  hardware  failure,  degradation,  etc.  Thus,  a  single  verification-based 
anomaly  detection  approach  was  developed  here  to  support  1)  software  anomaly  de¬ 
tection-discriminating  between  various  operating  conditions  to  detect  malfunctioning 
or  malicious  software,  firmware,  etc.,  and  2)  hardware  component  discrimination - 
discriminating  between  various  hardware  components  to  detect  malfunctioning  or 
counterfeit,  trojan,  etc.,  Integrated  Circuits  (ICs). 

Software  anomaly  detection  capability  is  assessed  in  Chapter  4  using  the  pro¬ 
posed  CBAD  process  with  three  specific  collected,  sampled,  and  post-collection 
processed  input  sequence  types:  1)  Time  Domain  (TD)  PLC  emission  sequences 
2)  Hilbert  transformed  PLC  TD  emission  sequences,  and  3)  Radio  Frequency  Dis¬ 
tinct  Native  Attribute  (RF-DNA)  feature  sequences.  Hardware  discrimination  capa¬ 
bility  is  likewise  assessed  in  Chapter  4  using  a  Generalized  Relevance  Learning  Vector 
Quantization-Improved  (GRLVQI)  process  with  two  specific  collected,  sampled,  and 
post-collection  processed  input  sequence  types:  1)  RF-DNA  feature  sequences  ex¬ 
tracted  from  TD  PLC  sequences,  and  2)  CBAD  Correlation  Domain  (CD)  feature 
sequences  extracted  from  Hilbert  transformed  TD  PLC  sequences.  Details  for  the 
PLC  devices,  PLC  Norm  and  Anom  Ladder  Logic  Programs  (LLPs),  RF  emission 
collection  and  processing,  and  the  CBAD  and  GRLVQI  verification  process  are  pro¬ 
vided  in  the  following  sections. 
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3.1  PL  C  Device  Description 


Chapter  4  results  are  based  on  experimentally  collected  RF  emissions  from 
No—lO  Allen  Bradley  SLC-500  05/02  Central  Processing  Unit  (CPU)  PLC  devices. 
The  Device  Under  Test  (DUT)  to  PLC  identity  (ID)  mapping  is  presented  in  Ta¬ 
ble  3.1.  The  PLC  devices  are  all  the  same  make  and  model  and  were  chosen  for 
proof-of-concept  demonstration  given  they  1)  are  readily  available  commercially, 
2)  they  are  prominently  used  in  industry,  and  3)  their  primary  Micro  Controller 
Unit  (MCU)  has  similar  clock  speed  and  internal  data  bus  structure  to  other  MCLT 
used  in  previous  related  efforts  [10,11]. 

Table  3.1:  Device  Linder  Test  (DUT)  to  PLC  Identity  ID  Mapping  and  Class  ID 
Assignment  Based  on  Device  Labeling  and  Logos. 


DUT  ID 

MCU  Label 

MCU  Logo 

PLC  ID 

Class  ID 

DUT1 

NXP 

None 

WQ 

1 

DUT2 

NXP 

None 

wv 

1 

DUT3 

None 

Philips 

KG 

2 

DUT4 

None 

Philips 

QI 

2 

DUT5 

Philips 

Philips 

KV 

3 

DUT6 

Philips 

Philips 

OV 

3 

DUT7 

Philips 

Philips 

RG 

3 

DUT8 

None 

Philips 

ZC 

4 

DUT9 

None 

Philips 

ZZ 

4 

DUT10 

Signetics  &  Intel 

Signetics 

ZA 

5 

The  selected  PLC  devices  are  visually  discernablc  and  were  categorized  into 
classes  based  on  different  labeling  characteristics.  An  additional  means  for  quali¬ 
tatively  categorizing  devices  is  through  visual  analysis  of  RF  emission  spectral  in¬ 
tensity— a.  graphical  representation  of  maximum  Power  Spectral  Density  (PSD).  For 
assessment  here,  spectral  intensity  plots  were  generated  by  collecting  IVb=400  emis¬ 
sions  from  each  device  executing  an  arbitrary  LLP  using  a  Low  Pass  Filter  (LPF)  to 
mitigate  aliasing  effects.  The  LPF  had  an  effective  bandwidth  of  UAp~81.0  MHz. 
Considering  an  arbitrary  sampled  TD  sequence  having  Ns  total  samples,  x[n)={x[rii], 
x[n2\,  ...,  x[njvJ},  the  corresponding  PSD  components  |X[n]|  can  be  obtained  using 
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a  Discrete  Fourier  Transform  (DFT)  given  by  [72], 


X[n] 


1 


Ns 

J2x[n}e-j^N°’k’m) 

k= 1 


1  <  m  <  Ns  , 


(3.1) 


where 


<&(NS,  k,m)  =  (k  —  l)(m  —  1)  :  1  <  m  <  Ns  .  (3.2) 

The  PLC  spectral  intensity  plots  (20x20  max[|X[n] |]  values)  were  generated 
using  (3.1)  and  (3.2)  for  all  ND=10  devices  using  1VB=400  total  emission  collections, 
with  one  emission  collected  from  each  of  (lVx=20)  x  (lVy=20)=400  uniformly  spaced 
points  on  a  rectangular  grid  over  the  DUT  surface.  The  resultant  spectral  intensity 
plots  are  shown  in  Fig.  3.1  and  provide  an  alternate,  qualitative  means  of  assigning 
DUTs  to  classes.  Each  point  on  the  2D  plots  represents  the  max[|X[n]|]  of  the  PSD 
series  associated  with  the  emission  collected  at  that  location.  Note  that  Device  RG 
is  assigned  to  Class  3  based  on  DUT  markings  in  Table  3.1,  but  bears  a  closer 
resemblance  to  devices  in  Class  1  when  considering  its  spectral  intensity  in  Fig.  3.1. 


Figure  3.1:  Spectral  intensity  plots  generated  as  emission  maximum  PSD  responses 
over  a  20x20  uniform  grid  above  the  PLC  MCU  surface.  Plots  enable  qualitative 
device  classification  base  on  visual  analysis  of  emission  characteristics.  With  one 
exception,  responses  here  confirm  the  PLC  class  assignments  in  Table  3.1  which 
are  based  on  device  label  markings;  the  RG  PLC  response  here  is  visually  more 
consistent  with  Class  1  vs.  the  Class  3  table  assignment. 
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3.2  PLC  Operating  Conditions 

PLC  emissions  were  collected  for  a  single  Norm  and  two  Anom  operating  con¬ 
ditions  using  the  experimental  LLPs  shown  in  Fig.  3.2.  Prior  to  collecting  PLC 
responses  using  the  methods  described  in  Sect.  3.4,  the  PLC  devices  were  pre¬ 
programmed  with  the  desired  LLP  which  was  then  executed  repeatedly  until  halted 
through  user  intervention.  Two  major  LLP  variants  were  implemented  for  demon¬ 
stration,  including  an  1)  Nqp=5  version  and  2)  Nqp—10  version  for  each  of  the 
Norm ,  Anom  #1  and  Anom  ^2  operating  conditions  (six  total  LLPs).  The  LLPs 
were  executed  repeatedly  and  emissions  collected  from  the  PLC  until  power  was 
turned  off  or  execution  was  terminated  through  user  intervention. 

3.2.1  Ladder  Logic  Program:  Nop=  5.  The  first  LLP  variant  is  used  to 

demonstrate  the  feasibility  of  CBAD  processing  using  Nqp= 5  LLP  operations  and 
consists  of  a  specific  order  of  Move  {ALOV)  and  Square- Root  ( SQR )  commands 
that  operate  on  data  within  the  PLC  memory.  While  not  graphically  illustrated,  the 
results  after  each  executed  operation  are  saved  to  registers  within  the  PLCs  before  the 
next  operation  is  executed.  As  shown  in  Fig.  3.2,  the  anomalous  operating  condition 
programs  were  generated  from  the  Norm  operating  condition  program  by  reordering 
{Anom  #  1)  and  replacing  {Anom  #2)  specific  operations.  These  anomalous  program 
conditions  are  intended  to  mimic  potentially  disruptive  and/or  malicious  alterations 
as  shown  in  .  As  seen  in  Fig.  2(a),  the  Norm  LLP  consists  of  alternating  ALOV  and 
SQR  operations:  {ALOV,  SQR,  ALOV,  SQR,  ALOV}.  These  were  chosen  to  contrast 
a  relatively  short  operation  {ALOV)  with  a  more  computational  demanding  operation 
{SQR)  in  an  effort  to  simplify  the  first  attempt  at  detecting  software  anomalies. 

The  Anom  #1  operating  condition  LLP  was  created  by  reordering  the  N2—2nd 
and  N3=3rd  operations.  The  resulting  LLP  consists  of:  {ALOV,  MOV,  SQR,  SQR, 
MOV}.  The  Anom  # 2  operating  condition  LLP  was  created  by  replacing  the 
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N4=4:th  operation  ( SQR )  with  a  MOV  operation.  The  resulting  LLP  consists  of: 
{MOV,  SQR,  MOV,  MOV,  MOV}. 

3.2.2  Ladder  Logic  Program:  Nop  =10.  The  second  LLP  variant  consists  of 
iVOp=10  total  PLC  operations  and  includes  a  specific  order  of  Move  {MOV),  Square- 
Root  {SQR),  Add  {ADD),  Multiply  { MUL ),  Subtract  {SUB),  Divide  ( DIV ),  Negate 
{NEG),  Convert  To  Binary  Coded  Decimal  {TOD),  and  Convert  From  Binary 
Coded  Decimal  {FRD)  commands  that  operate  on  data  within  the  PLC  memory. 
While  not  graphically  illustrated,  the  results  after  each  executed  operation  are  saved 
to  registers  within  the  PLCs  before  the  next  operation  is  executed.  The  operations 
were  selected  to  exercise  the  available  math  functions  for  the  selected  PLCs.  As 
shown  in  Fig.  3.2,  the  anomalous  operating  condition  programs  are  generated  from 
the  Norm  operating  condition  program  by  reordering  {Anom  ffl)  and  replacing 
{Anom  ff2)  specific  operations.  These  anomalous  program  conditions  are  intended 
to  mimic  potentially  disruptive  and/or  malicious  alterations  as  shown  in  Fig.  3.2. 

The  Anom  #1  operating  condition  LLP  was  created  by  reordering  the  N5=5th 
and  N6—6th  operations.  The  resulting  LLP  consists  of:  {MOV,  SQR,  ADD,  MUL, 
DIV ,  SUB,  NEG,  TOD,  FRD,  SQR}.  The  Anom  #2  operating  condition  LLP  was 
created  by  replacing  the  N±=4th  operation  {MUL)  with  an  ADD  operation.  The 
resulting  LLP  consists  of:  {MOV,  SQR,  ADD,  ADD,  SUB,  DIV ,  NEG,  TOD, 
FRD,  SQR}. 

3.3  CBAD  Processing  Overview 

The  CBAD  process  was  implemented  as  illustrated  in  Fig.  3.3  and  used  to 
perform  verification-based  anomaly  detection  using  five  distinct  sub-processes: 

1.  RF  Emission  Collection-emissions  are  collected  from  each  PLC  operating  under 
Norm  and/or  Anom  conditions  as  required  to  support  both  software  anomaly 
detection  and  hardware  component  discrimination  assessment. 
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Figure  3.2:  Normal  (Norm)  and  Anomalous  (Anom)  ladder  logic  programs  for 
(a)  Nop=5  and  (b)  Nop=10  operating  conditions.  Anomalous  conditions  are  in¬ 
duced  through  reordering  (Anom  #1)  and  replacement  (Anom  # 2 )  of  selected  op¬ 
erations. 

2.  Data  Segregation-sequences  are  divided  into  independent  “Training”  (xTng[n]) 
and  “Testing”  (xpst  H)  sets;  this  distinction  is  adopted  here  for  consistency 
with  terminology  used  in  the  pattern  recognition  community  [22], 

3.  Normal  Reference  Sequence  Generation-normal  reference  sequence  opv[n]  in 
Fig.  3.3  is  generated  using  Normal  “Training”  data  in  XTng[n]- 

4.  Cross-Correlation  Cjsrc[k ]  Generation-CjvcM  is  generated  using  the  selected 
£jv[n]  reference  sequence  and  a  given  Collected  sequence  xc[n)  to  be  verified. 
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Figure  3.3:  Correlation-Based  Anomaly  Detection  (CBAD)  process  for  verifying 
that  the  Current  unknown  sequence  Xc[n]  is  a  result  of  either  a)  Normal  operating 
conditions  (declared  when  zy<ty )  or  b)  Anomalous  operating  conditions  (declared 
Zy>ty).  The  claimed  condition  is  always  normal  and  implemented  using  a  correla¬ 
tion  reference  of  Xr—Xn  [89]. 

5.  Verification  Test  Statistic  Generation-verification  test  statistic  £y  is  generated 
using  a  selected  Difference  Function  /a  and  correlation  Difference  CA [k] ,  he., 

ZV=fA(CA[k}). 

6.  Establish  Verihcation  Threshold-verification  threshold  ty  is  determined  and  set 
using  CBAD  “Training”  statistics  {Ay[n]}  under  Norm  operating  conditions. 

7.  Verihcation  Declaration-test  statistic  zyTst  is  compared  with  the  established 
verihcation  threshold  ty  and  a  declaration  made  such  that  ZyTsi<ty^rN orm 
and  zyTst>tv^Anom. 

More  details  for  each  of  these  CBAD  processing  steps  are  provided  in  Sect.  3.8. 


3 .4  RF  Emission  Processing 

Experimentally  collected  PLC  emissions  were  used  to  form  required  input  se¬ 
quences  for  CBAD  and  RF-DNA  processes.  RF  emissions  were  collected  using  a 
Riscure  RF  probe  attached  to  a  LeCroy  804Zi  Oscilloscope.  All  DUT  RF  emissions 
were  collected  at  sample  frequency  of  fs= 250  MSps  using  a  near-held  probe  having 
a  baseband  bandwidth  of  Wbb= 500  MHz.  Following  the  collection  and  sampling  of 
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the  emissions,  they  are  post-collection  processed  using  MATLAB®  functions.  The 
processing  includes  filtering,  downconverting,  and  decimating  the  emissions  prior  to 
using  the  emissions  as  input  sequences  for  the  CBAD  process.  The  following  sections 
provide  details  related  to  the  processing  and  collection  of  the  RF  emissions. 

3-4-1  Collection  and  Sampling.  The  frequency  of  interest  for  the  RF  collec¬ 
tions  against  URE  devices  in  previous  research  efforts  had  been  selected  based  on  the 
harmonics  of  the  clock  frequency  for  the  target  devices  [10,11].  The  observed  MCU 
clock  frequency  in  the  Allen  Bradley  PLCs  was  Jclk= 18.5  MHz,  with  the  strongest 
frequency  component  spectrally  aligned  with  a  clock  harmonic.  As  seen  in  Fig.  3.4, 
this  component  is  manifest  near  the  Hclk= 3rd  MCU  clock  harmonic  for  the  Allen 
Bradley  PLCs  considered  and  has  a  targeted  collection  frequency  of  fc= 55.5  MHz. 
To  ensure  the  targeted  signal  frequency  is  collected  in  compliance  with  Nyquist  cri- 


Figure  3.4:  Representative  normalized  PSD  for  the  PLC  WQ  device  showing  a 
distinct  peak  response  at  /~ 55.3  MHz. 

teria,  the  signal  is  sampled  at  a  rate  of  /s=250MSps.  To  minimize  aliasing  caused  by 
frequency  components  />125  MHz,  the  signal  is  filtered  after  collection  by  the  RF 
probe  and  prior  to  sampling  using  a  passive  inline  LPF  having  a  cutoff  frequency  of 
fco= 81.0  MHz  such  that  all  frequency  components  greater  than  fco  are  attenuated 
by  3.0dB  or  greater.  The  attenuation  for  frequency  values  of  interest  is  shown  in 
Fig.  3.5.  The  collected,  filtered,  sampled  RF  emission  is  stored  as  a  sequence  of 
real  values  representing  the  measured  voltage  of  the  signal  as  sampled  at  each  time 
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region.  The  real  values  are  stored  as  8-bit  integers.  The  signal  collection  and  storage 
process,  from  collecting  using  the  RF  probe,  to  storage  as  integer  sequences  is  per¬ 
formed  in  real-time.  Following  the  collection  and  storage,  the  signals  are  processed 
using  MATLAB®  prior  to  being  used  as  inputs  for  the  CBAD  process. 


Figure  3.5:  Impulse  frequency  response  of  the  fco= 81-0  MHz  LPF.  The  filter  is 
designed  to  mitigate  adverse  aliasing  effects  by  attenuating  frequency  components 
above  f~l 25  MHz  by  at  least  29.5dB. 

3-4-2  PLC  Mainboard  Mounting.  Prior  to  emission  collection,  the  DUT 
must  be  placed  in  a  position  such  that  the  near-field  probe  can  be  placed  in  close 
proximity  to  the  MCU  on  the  PLC  Mainboard.  The  PLCs,  as  manufactured,  do  not 
provide  space  for  placement  of  the  probe  due  to  obstruction  caused  by  the  casing. 
All  mainboards  were  removed  from  their  casing  for  the  purpose  of  this  research 
effort.  The  entire  PLC  device  plugs  into  a  backplane,  which  provides  power  and 
communication  between  PLC  modules.  In  order  to  provide  room  to  place  the  probe, 
the  PLC  mainboards  are  connected  to  the  backplane  using  a  set  of  extension  cables. 
All  PLCs  are  connected  using  the  same  set  of  cables,  which  must  be  unplugged  from 
one  DUT  and  plugged  into  the  next  between  DUT  emission  collections.  The  PLC 
mainboards  are  placed  on  a  probe  table  providing  support  for  the  mainboard  and 
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the  ability  to  move  the  probe  in  three  dimensions  spatially.  The  probe  table  provides 
precise  placement  of  the  probe,  but  does  not  have  a  native  ability  to  repeat  probe 
placement  positions  between  collections.  A  probe  placement  routine  was  used  to 
reliably  place  the  probe  prior  to  each  collection.  The  probe  placement  routine  is 
discussed  in  Sect.  3.4.3.  The  collection  configuration  can  be  seen  in  Figure  3.6. 


Figure  3.6:  Picture  of  XYZ  near-held  probe  station  used  for  collecting  PLC  emissions. 

3-4-3  RF  Near-Field  Probe  Placement.  Course  near-held  probe  placement 
was  determined  once  per  DUT  during  initial  testing  and  physically  marked  for  re¬ 
peated  placement  during  subsequent  collections.  The  alignment  to  a  physical  marker 
is  not  precise  enough  to  avoid  altering  the  collected  emissions  between  collection 
sessions  where  the  DUT  must  be  removed  from  the  probe  table  and  replaced  for  col¬ 
lection.  Probe  placement  was  performed  through  a  two  step  process  that  included 
1)  Course  Placement-the  probe  is  placed  a  predetermined  location  on  the  device 
surface,  and  2)  Refined  Placement-the  probe  is  repositioned  based  on  RF  emission 
analysis. 

The  physical  location  is  defined  on  each  device  such  that  the  same  position 
is  used  for  every  device  relative  to  each  device’s  physical  attributes.  To  limit  the 
variation  between  collection  locations  on  the  devices,  the  probe  is  placed  in  a  location 
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such  that  two  lines,  parallel  to  the  physical  edges  of  the  PLC  MCU,  but  perpendicular 
to  each  other,  are  tangential  to  the  edge  of  the  probe.  Figure  3.7  shows  the  physical 
location  where  the  probe  is  placed. 


Figure  3.7:  The  red  perpendicular  lines  are  tangent  to  the  near-held  probe  (blue 
dot)  and  identify  the  location  used  for  PLC  MCU  emission  collection. 


The  specific  probe  placement  is  determined  by  collecting  emissions  at  Nl— 100 
locations  on  a  (Hx=10)  x  (Z)y=10)  dimensional  grid  over  a  (xm=0.75  cm)  x  (ym— 0.75  cm) 
square  region  on  the  MCU  surface.  At  each  location  %,  a  single  alignment  location 
sequence  xai  [n]  is  collected.  The  sequence  is  collected  during  the  PLC  execution  or 
scan  of  an  alignment  LLP.  The  alignment  LLP  uses  the  MOV  and  SQR  PLC  opera¬ 
tions.  The  alignment  LLP  consists  of  an  ordered  sequence  of  Nqp=Q  PLC  operations: 
{MOV,  SQR,  MOV,  SQR,  MOV,  SQR}.  The  collected  sequences  are  processed 
using  the  same  process  and  method  as  is  implemented  for  the  post-collection  pro¬ 
cessing  detailed  in  Sect.  3.5.  The  details  of  the  processing  are  not  critical  to  the 
understanding  of  the  probe  placement  routine,  but  it  is  important  to  note  that  they 
match  those  implemented  during  the  training  and  testing  of  the  CBAD  Process. 

Recall  the  alignment  LLP  consists  of  alternating  MOV  and  SQR  operations. 

An  alignment  reference  LLP  is  used  in  conjunction  with  the  alignment  LLP  to  select 
the  probe  position  prior  to  collecting  emissions  from  the  PLC  MCU.  The  alignment 
reference  LLP  consists  of  a  single  MOV  and  SQR  operation.  The  sampled,  discrete 
alignment  reference  signal  xp[n\  is  collected  while  the  reference  LLP  is  executed  by 
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the  PLC  and  it  represents  a  single  scan  of  the  alignment  reference  LLP.  While  there 
are  iVL=100  alignment  sequences,  {x  a  \  [n] ,  x^n],  ...,  a^iooM])  there  is  only  a  single 
alignment  reference  signal  Xr[ti]. 

Correlation  was  the  foundation  of  this  research  and  plays  a  critical  role  in 
nearly  every  aspect  of  CBAD  processing,  including  probe  placement  and  alignment. 
The  correlation  process  is  analytically  described  from  a  random  process  perspective 
in  Sect.  2.4.1.  For  the  purpose  of  this  research  a  tailored  correlation  process  is 
implemented.  Considering  two  real- valued  discrete  input  sequences,  x[n]  having  Nx 
samples  and  y[n]  having  Ny  samples  with  Ny<Nx,  the  jth  element  of  cross-correlation 
sequence  Cxy[k\  as  implemented  in  this  research  is  calculated  as 

Nx 

Cxy[kj\  =  •  y[nj+i]  :  1  <  j  <  Ny  -  Nx  .  (3.3) 

i= 1 

An  automated,  repeatable  approach  for  evaluating  responses  from  the  iW=100 
probe  locations  on  the  DLIT  was  needed  to  select  a  location  that  was  best-suited  for 
emission  collections.  The  evaluation  criteria  are  derived  from  the  aligned  correlation 
sequence  CAi  [k]  resulting  from  the  correlation  process  equation  (3.3)  using  the  ith 
alignment  emission  xa% [n]  and  the  alignment  reference  emission  Xr\u\  as  inputs.  For 
the  ith  alignment  sequence  xAi [n]  having  Na  samples  and  reference  emission  Xr[h] 
having  Nr  samples,  the  jth  element  of  the  ith  alignment  correlation  sequence  CAt  for 
the  ith  probe  location  is  calculated  as 

Nr 

CAi[kj\  =  Y,  xR[ri,]  ■  XA.[ nj+i\  :  1  <  j  <  Na  -  Nr  .  (3.4) 

i=  1 

Correlation  sequence  peaks  provide  a  measure  of  performance  for  each  poten¬ 
tial  probe  position.  The  alignment  reference  LLP  sequence  Xr[ti]  consists  of  a  single 
{MOV,  SQR}  sequence  as  compared  to  the  three  {MOV,  SQR}  sequences  in  the 
alignment  LLP  sequence  xA[n\.  For  each  emission  collected,  Np= 3  peaks  are  ex- 


42 


pected  in  the  Ca  [k]  sequence  given  that  the  alignment  LLP  consists  of  the  reference 
LLP  repeated  Np— 3  times.  The  probe  position  on  the  (Dv=10)  x  (Dy=10)  dimen¬ 
sional  grid  was  selected  based  on  a  voting  process  that  considers  three  values  for  each 
of  the  A%=100  alignment  correlation  sequences:  1)  the  maximum  correlation  value, 
2)  the  mean  value  of  the  highest  Np= 3  correlation  peaks,  and,  3)  the  sum  of  the 
highest  Np= 3  correlation  peaks.  A  probe  position  producing  the  highest  value  for 
two  of  the  three  criteria  is  used  as  the  position  for  emission  collection.  For  cases  when 
no  single  emission  satisfies  this  criteria,  the  probe  position  yielding  the  maximum 
mean  correlation  of  the  highest  three  peaks  is  used. 

3.4.4  PLC  LLP  Triggering.  A  trigger  was  used  to  initialize  RF  emission 
collections  based  on  a  Light  Emitting  Diode  (LED)  output  voltage  (Vled— 5-0  V) 
assigned  as  a  physical  PLC  register  output  during  the  first  MOV  operation  in  each 
LLP.  This  output  was  toggled  during  each  scan  by  a  square  wave  having  an  ap¬ 
proximate  50%  duty  cycle  and  scan  frequency  of  fscn—^-l  (2'xTscn)  where  TScn  is  the 
approximate  time  it  takes  to  complete  a  single  LLP  scan.  Both  the  leading  and 
trailing  edge  of  the  square  wave  were  used  as  a  trigger.  Since  the  PLC  outputs  are 
assigned  at  the  end  of  a  scan,  the  triggered  collections  actually  began  just  prior  to  the 
start  of  a  subsequent  scan  with  square  wave  period  ( Tscn )  providing  an  approximate 
measure  of  collected  scan  duration. 

3.5  Post- Collection  Processing 

Following  the  collection,  sampling,  and  storage  the  emissions  are  post-collection 
processed  using  MATLAB®.  Before  the  post-collection  process  can  begin  the  se¬ 
quences  are  converted  from  the  native  Riscure  Inspector®  software  format  to  a 
MATLAB®  compatible  format.  This  is  accomplished  using  code  developed  in  sup¬ 
port  of  previous  AFIT  Unintentional  Radiated  Emission  (URE)  research  efforts  [9]. 
The  code  was  implemented  in  its  original,  unaltered  state  and  so  is  not  discussed  in 
detail  for  this  research  effort. 
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Once  the  collected  emissions  are  converted  to  a  MATLAB®  compatible  format, 
post-collection  processing  can  be  performed  using  four  primary  steps:  1)  down- 
conversion  to  an  Intermediate  Frequency  (IF),  2)  digital  bandpass  filtering,  3)  down- 
sampling  using  proper  decimation,  and  4)  applying  the  selected  transform  to  obtain 
the  final  sequence  used  for  verification.  Each  of  these  processes  are  described  in 
greater  detail  in  the  following  sections. 

3.5.1  Down- Conversion  and  Bandpass  Filtering.  Following  collection  and 
storage  of  the  input  sequences,  the  signals  were  processed  using  MATLAB®  to  isolate 
specific  frequency  components  of  interest,  down-convert  the  signals  to  near-baseband, 
and  properly  decimate  to  signals  to  reduce  computational  overhead  of  subsequent 
processing.  The  emissions  were  digitally  filtered  after  collection  using  an  8^-order 
Butterworth  bandpass  filter  with  a  center  frequency  of  fsp= 55.5  MHz  and  — 3.0dB 
bandwidth  of  Wbp— 1.0  MHz.  The  frequency  response  of  the  filter  is  presented  in 
Fig.  3.8.  The  center  frequency  was  empirically  selected  based  on  observing  emissions 


Figure  3.8:  Impulse  frequency  response  of  the  digital  8th-order  Butterworth  bandpass 
filter  having  a  center  frequency  of  /Ap~55.5  MHz  and  a  —3.0  dB  bandwidth  of 
Wbp~  1.0  MHz. 
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from  all  PLC  devices  as  the  third  MCU  clock  harmonic  (fc=^xfcLK=55.5  MHz). 
The  center  frequency  and  bandwidth  were  selected  based  on  analysis  of  center  fre¬ 
quencies  fc— {18. 5,  37.0, 55.5,  74.0}  MHz  as  aligned  to  the  first  four  clock  harmonics 
for  the  observed  MCU  clock  frequency  fcLK— 15.5  MHz  and  —3.0  dB  bandwidths 
of  BWbp={1-0,  2.0,  3.0, 4.0,  5}  MHz.  The  analysis  demonstrated  that  the  selected 
bandwidth  and  center  frequency  provided  the  best  performance  in  accurately  dis¬ 
criminating  between  the  MOV  and  SQR  PLC  operations.  Operation  discrimination 
performance  was  assessed  using  the  CBAD  process  to  generate  test  statistics  with  a 
single  training  reference  sequence  and  multiple  test  sequences  from  the  N^— 6  PLCs 
that  were  initially  purchased  to  support  the  research  effort;  these  are  the  WQ,  WV, 
RG,  KG,  KV,  QI  devices  identified  in  Table  3.1. 

3.5.2  Sub- Sampling /Proper  Decmation.  Based  on  spectral  analysis  and 
in  accordance  with  Nyquist  criteria,  the  down-converted  bandpass  TD  responses 
were  properly  decimated  by  a  factor  of  20  to  produce  sub-sampled  sequences  at 
fs= 12.5  MSps  for  post-collection  processing.  By  down-converting  the  filtered  signal 
to  fiF— 2-0  MHz,  the  original  signal  content  in  /g[55.0,  56.0]  MHz  is  relocated  to  a 
down-converted  range  of  /e  [1.0,  3.0]  MHz.  With  the  frequency  content  of  interest 
centered  at  fiF= 2.0  MHz,  the  signal  was  filtered  using  a  LPF  having  a  —3.0  dB 
cutoff  frequency  of  /lpf= 3.5  MHz  and  the  impulse  response  seen  in  Fig.  3.9.  The 
filtered  signal  is  decimated  by  a  factor  of  20,  reducing  the  number  of  signal  samples 
yielding  a  final  sampling  rate  of  /s=12.5  MSps  for  the  down-converted  signal. 

3.5.3  Signal-to-Noise  Ratio  Scaling.  The  experimentally  collected  emis¬ 
sions  consisted  of  two  components,  including  the  1)  desired  signal  component  xs[n], 
and  an  2)  undesired  background  noise  component  x b  [n] .  It  was  assumed  the  sig¬ 
nal  and  noise  components  are  independent  and  that  xs[n]  is  generally  determinis¬ 
tic  and  x b  [n]  is  a  random  process;  under  these  assumptions  the  collected  response 
xc[n}=Xs[n]+XB[n\  is  a  random  process.  One  research  objective  involved  assessing 
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Figure  3.9:  Impulse  frequency  response  of  the  digital  8^-order  Butterworth  LPF 
having  a  — 3.0dB  bandwidth  of  Wlp~3.5  MHz. 

verification-based  anomaly  detection  performance  under  varying  SNR  conditions 
given  that  SNR  variation  commonly  occurs  in  operational  environments  given  in¬ 
herent  RF  channel  variation  between  the  RF  source  and  collection  receiver.  The 
SNR  is  calculated  as  the  ratio  of  average  signal  power  (Ps)  to  average  noise  power 
(Pat)  expressed  in  Decibels  (dB). 

To  mitigate  the  need  for  repeated  emission  collections  at  varying  distances  and 
channel  conditions,  the  SNR  variation  effects  were  simulated  by  adding  like-filtered, 
power  scaled  Additive  White  Gaussian  Noise  (AWGN)  noise  realizations  x n  [n]  to  as- 
collected,  filtered  PLC  emissions  Xc[n}.  The  resultant  “as  evaluated”  sequence  that  is 
used  for  performance  assessment  and  analysis  is  given  by  x a[u]=x s[n]+x s[n]+x jv[n] 
where  Xa r[n)  has  been  appropriately  power-scaled  to  achieve  the  desired  analysis 
SNRa ■  The  average  power  in  an  arbitrary  complex  sequence  ym]  having  Ny  can  be 
estimated  using, 

1  Ns 

py~  aT  XI  yinMni]* ,  (3-5) 

y  i= i 
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where  *  denotes  complex  conjugate.  When  y[n]  is  real-valued  (3.5)  becomes, 


py  ~  ■  ( 3 -6) 

■L'y  •  , 

y  i=l 

Considering  that  real- valued  PLC  collections  were  used  for  this  research,  the  ex¬ 
pression  in  (3.6)  was  appropriate  for  calculating  required  average  powers  and  AWGN 
power  scale  factors.  Given  a  desired  SNR A  with  XA['n]=xs[n]+xB[n\+XN[n\  repre¬ 
senting  the  analysis  sequence,  the  average  power  in  x^[n]  and  its  components  can 
be  calculated  using  (3.5)  and  are  denoted  by  Pa,  Ps,  Pb,  and  Pn  for  the  respective 
rq[n],  xs[n],  x B [rt] ,  and  apvKI  sequences.  Assuming  all  components  of  x a [n]  are  in¬ 
dependent,  the  total  average  power  in  xA\p\  is  Pa=Ps+Pb+Pn  and  SNRa  can  be 
calculated  using 

SNRa  =  Ps/(Pb  +  Pn)  .  (3.7) 

Given  that  Ps  and  PB  can  be  estimated  for  experimentally  collected  emissions, 
the  SNRa  expression  in  (3.7)  is  used  to  solve  for  the  required  Pn  in  xj v[n]  using 

PN  =  ( Ps/SNRa )  -  PB  ,  (3.8) 

which  in  turn  is  used  to  calculate  the  corresponding  power  scale  factor  for  AWGN 
noise  realizations,  i.e.,  xj v[n]~-\/P/v  xawgn[w\  for  xawgn[w\  '■  AT[0, 1]. 

Each  collection  of  “as  evaluated”  xA[n\  analysis  sequences  at  SNRa  that  are 
input  to  the  CBAD  process  was  generated  using  a  total  of  NNz  independent,  like- 
filtered  AWGN  realizations  for  xn[it].  Thus,  for  a  performance  assessment  based  on 
Nb  collected  emission  sequences  {xci\p],  xc2[n],  xc nu [n] }  there  are  a  total  of 
Nz=NbxNnz  sequences  used  for  each  SNRa  considered. 
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3. 6  Sequence  Transformation 

Prior  to  inputting  sequences  into  the  CBAD  process,  the  sequences  are  trans¬ 
formed  using  one  of  three  methods  implemented  under  this  research:  1)  an  absolute 
value  function,  2)  a  Hilbert  transform  function,  and  3)  an  RF-DNA  transform.  In  ad¬ 
dition,  the  resultant  absolute  value  and  Hilbert  transform  sequences  are  normalized 
to  produce  x[n]  which  is  input  to  the  CBAD  process.  For  sequence  x[n],  normaliza¬ 
tion  of  the  ith  element  is  given  by 


x[ni 


x[nj _ 

max(x[n]) 


(3.9) 


The  absolute  value  function  is  the  simplest  and  transforms  a  given  emission  sequence, 
x [n] ,  by  computing  the  magnitude  of  each  sequence  element.  This  process  is  well- 
known  and  does  not  warrant  additional  discussion.  The  Hilbert  transform  and  RF- 
DNA  Transform  methods  are  more  complicated  and  discussed  further  in  the  following 
sections. 


3.6.1  Hilbert  Transform.  The  CBAD  verification  process  is  agnostic  to 
what  the  sequence  elements  represent  and  the  above  process  is  applicable  for  all  real¬ 
valued  sequences  x[n].  Thus,  the  sequences  can  be  generated  as  either  the  magnitude 
of  untransformed  real  valued  TD  sequences  (|a;[n]|)  or  as  the  magnitude  of  Hilbert 
transformed  TD  responses  (|i7[x[n]]|).  The  transition  to  |i7[x[n]]|  sequences  was 
motivated  by  previous  research  showing  that  anomaly  detection  capability  using 
|x[n]|  sequences  is  negatively  impacted  by  cross-collection  variance  in  RF  emissions. 
The  observed  misalignment  (cross-collection  time  registration)  of  data  sets  was  often 
less  than  ±10  TD  samples,  yet  resultant  variation  degraded  verification  performance 
considerably.  Thus,  as  in  audio  signal  processing  applications  the  Hilbert  transform 
is  used  to  stabilize  signal’s  amplitude  estimates,  [32,71]. 

Recall  the  frequency  components  of  interest  are  constrained  in  a  frequency 
range  centered  around  fc= 55.5  MHz  with  a  bandwidth  of  Wbw— 1MHz.  Similar  to 
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the  work  in  [101],  the  Hilbert  transform  provides  a  means  of  estimating  an  ampli¬ 
tude  envelope  for  a  narrowly  determined  frequency  range.  The  amplitude  estimate 
sequence  is  obtained  from  the  Hilbert  transformed  sequence  by  calculating  the  mag¬ 
nitude  of  the  complex  pair  representing  each  element  of  the  Hilbert  transformed 
sequence.  The  Hilbert  transform  in  (2.4)  effectively  shifts  the  phase  of  a  continuous 
signal  by  <f>= 7t/2  radians  for  all  frequency  components.  The  MATLAB®  hilbert 
function  is  used  to  generate  the  transformed  discrete  sequence  if[x[n]]  for  a  given 
real-valued  sequence  x[n\.  The  hilbert  function  in  MATLAB®  returns  a  complex 
time  analytic  representation  of  the  signal  having  In-phase  (I)  and  Quadrature  (Q) 
components.  The  magnitude  response  of  a  discrete  Hilbert  sequence  |iL[x[n]]|  rep¬ 
resents  the  instantaneous  amplitude  or  envelope  of  the  discrete  sequence  x [n] .  The 
input  sequence  to  the  CBAD  process  is  the  magnitude  of  the  Hilbert  Transformed 
sequence  H [xs [n] ]  | ,  which  is  the  same  length  as  the  TD  sequence  x[n].  Consider  a 
Hilbert  transformed  sequence  generated  using  the  hilbert  function,  H[xs[n].  Any 
element  H[xs[ni]\  of  the  sequence  consist  of  both  real,  Hre[xs[n}\  and  imaginary, 
Him[xs[n]\  components  representing  the  In-Phase  and  Quadrature  components  of 
the  real  signal.  The  amplitude  estimate  sequence  |iL[x[n]]|  is  defined  for  any  ele¬ 
ment  \H[x[n,]]  by  calculating  the  2-norm  with  each  real-imaginary  pair  considered 
a  vector 


|./L  [x  [fij]]  |  1 1  ^  ddre  \%.s  [^b]  ] ,  dd>im  [xy  [rq]  ]  || 

(3.10) 

=  V  ( Hre[xs[ni ]])2  +  ( Him[xs[ni ]])2 

A  representative  magnitude  TD  |x[n]|  sequence  is  shown  in  Fig.  3.10  along 
with  its  corresponding  magnitude  |iL[x[n]]|  sequence. 

3.6.2  RF-DNA  Transform.  The  RF-DNA  transform  was  implemented 
according  to  the  process  in  [9,21,39,58,76,102]  and  was  used  in  this  research  to 
reduce  the  dimensionality  of  the  input  sequences  and  identify  those  signal  attributes 
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Figure  3.10:  Representative  responses  for  the  first  two  LLP  operations  ( SQR  and 
MOV )  under  Norm  operating  conditions:  (Top)  Magnitude  of  TD  |x[n]|  sequence 
and  (Bottom)  corresponding  Magnitude  of  Hilbert  transform  |iL[x[n]|  sequence. 

that  aid  in  the  discrimination  of  hardware  devices.  The  RF-DNA  transform  is  a 
mechanical  process  calculating  sequence  attributes  without  reliance  or  dependance 
on  the  source  of  the  sequence.  It  has  been  used  previously  on  sequences  representing 
TD,  Spectral  Domain  (SD),  and  Time-Frequency  (T-F)  sequences  [11,42,81].  For 
the  purpose  of  this  research  effort,  the  RF-DNA  was  performed  on  TD  emissions 
only. 

RF-DNA  reduces  the  dimensionality  of  the  TD  sequences  by  calculating  sta¬ 
tistical  values  (standard  deviation  a,  variance  cr2,  skewness  7,  and  kurtosis  k)  for 
instantaneous  sequence  attributes  (amplitude  a,  phase  frequency  /)  over  Nr  spec¬ 
ified  signal  regions  for  an  arbitrary  input  sequence  x[n].  Features  are  calculated  for 
each  specified  region  of  the  sequence  and  concatenated  together  to  form  an  RF-DNA 
fingerprint  f td  representing  the  entire  sequence  x[n]. 

Each  collected  DLIT  emission  collected  for  this  research  was  stored  as  a  real¬ 
valued  TD  sequence  x[n\.  Given  that  the  RF-DNA  process  is  inherently  base  on 
complex  IQ  input  sequences,  the  collected  x[n]  here  were  converted  to  complex  IQ  se¬ 
quences  of  the  form  x jQ[n)=xre[n]+Xim[n]  using  the  hilbert  function  in  MATLAB® . 
Composite  RF-DNA  fingerprints  were  generated  from  selected  sequences  using  the 
following  steps: 

1.  A  given  sequence  x jq [n]  is  divided  into  Nr  equal  length  contiguous  subregions. 
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2.  Within  a  given  subregion,  the  mean  /j  value  is  calculated  and  subtracted  from 
all  subregion  samples  to  minimize  the  impact  of  collection  bias. 

3.  The  desired  instantaneous  feature  sequence(s)  (phase  <f)[n],  amplitude  a[n], 
and/or  frequency  / [n] )  is  calculated  for  the  subregion  samples. 

4.  Selected  statistical  attributes  of  standard  deviation  cr,  variance  cr2,  skewness 
7,  and/or  kurtosis  n  are  calculated  using  all  samples  within  the  subregion. 

5.  The  resultant  statistical  attributes  are  concatenated  to  form  a  single  Regional 
Fingerprint  sequence  with  elements  arranged  in  order  of  signal  feature  and 
statistical  attribute. 

6.  Steps  2-4  are  repeated  for  each  subregion  of  x[n]  and  the  NR  Regional  Finger¬ 
print  sequences  are  concatenated  to  form  the  Composite  Fingerprint  sequence 
for  x  [n] . 

The  following  sections  provide  more  detail  for  each  of  the  processes  used  to  generate 
a  Composite  Fingerprint  sequence  for  a  TD  x  jq  [n]  sequence. 


3.6.2. 1  Instantaneous  Feature  Calculation.  The  first  step  in  generat¬ 
ing  an  RF-DNA  fingerprint  from  a  TD  signal  is  calculation  of  selected  instantaneous 
signal  features  for  the  sampled  TD  signal.  For  the  element  xiQ[ni\=xre[ni\ 
the  instantaneous  a/rq],  phase  0[rq],  and  frequency  /[rq]  sequence  elements  were 
calculated  using  [103] 


a[rii 


(3.11) 


ni\  =  tan 


-l 


%im  iP'i] 
Xre 


,  Xre  Uj]  7^  0  , 


(3.12) 


f[ni]  =  2? 


d(f)[ni 

dm 


(3.13) 


where  1  <  %  <  Nx  and  Nx  is  the  total  number  of  elements  in  xjq  [n]. 
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For  consistency  with  previous  research,  the  TD  sequences  in  (3.11)-(3.13)  are 
centered  and  normalized  using  (3.9);  centering  simply  removes  the  sequence  mean 
(/i)  prior  to  normalization.  The  ith  element  of  the  centered  and  normalized  sequences 
ac[rii\,  (f>c[rii],  and  fc[rii]  are  calculated  using  [76] 


^  c  i 


CL  JXi  fid 


max  { ac[rij 
i  <j<Nx 


(3.14) 


(f)  Hi  jJjfp 


max  {(pc[rij 

1  <J<NX 


/cK 


/N  -  Rf 


max  (fc[nj]y 

l<j<Nx 


(3.15) 


(3.16) 


where  1  <  %  <  Nx  and  Nx  is  the  total  number  of  elements  in  xjq[ti\. 

The  resultant  ac[rq],  dyjrq] ,  and  fc[ni\  sequences  are  divided  into  NR  specified 
regions  prior  to  calculating  the  desired  statistics  of  standard  deviation  a,  variance 
a2,  skewness  7,  and/or  kurtosis  k  for  the  signal  attribute  sequences.  Additionally, 
the  statistics  can  be  calculated  over  the  entire  signal  response  (union  of  all  subre¬ 
gion  samples).  The  final  Composite  Fingerprint  sequence  for  xjq [n]  is  formed  by 
concatenating  all  subregions  statistics,  and  the  entire  region  statistics  if  generated. 
This  is  illustrated  in  Fig.  3.11  which  shows  an  abstract  representation  of  R.F-DNA 
fingerprint  generation  using  an  arbitrary  feature  sequence  [103]. 


3.7  Region  of  Interest  Selection 

When  processed  according  to  Sect.  3.5,  the  resultant  sequences  represent  the 
emission  data  across  an  entire  LLP  scan.  Previous  efforts  that  targeted  LIRE  re¬ 
sponses  were  based  on  experiments  where  the  researcher  had  precise  control  of  the 
devices  being  analyzed  [10, 11].  As  previously  discussed,  the  PLC  scan  includes 
not  only  the  logic  operations  explicitly  defined  by  the  LLP,  but  also  includes  the 
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Arbitrary  Feature  Sequence 
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Figure  3.11:  Abstract  representation  of  RF-DNA  fingerprint  formation  for  an  arbi¬ 
trary  sequence  divided  into  Nr  subregions  [103].  Standard  deviation  (cr),  variance 
(cr2),  skewness  (7),  and/or  kurtosis  (n)  are  commonly  used  as  RF-DNA  features. 

process  of  evaluating  the  physical  device  input  values  and  assigning  physical  device 
output  values.  Additionally,  the  PLC  device  also  must  perform  low-level  Operating 
System  (OS)  functions  such  as  system  memory  management  and  interrupt  polling. 
Significant  portions  of  the  RF  emission  signal  are  not  directly  attributable  to  the 
LLP  operations  as  seen  in  Fig.  3.12.  The  signal  attributable  to  the  specified  LLP 
operations  must  be  extracted  from  the  entire  scan  signal  to  produce  a  Region  Of 
Interest  (ROI).  A  representative  signal  collected  from  an  entire  scan  with  the  ROI 
highlighted  is  pictured  in  Fig.  3.12.  Once  the  ROI  has  been  identified  in  a  single 
scan  signal,  it  must  be  successfully  and  automatically  extracted  from  the  scan  signal 
content.  Previous  research  efforts  have  involved  IRE  with  clear,  definable  commu¬ 
nication  bursts  that  are  clearly  separable  from  the  channel  noise  [34,39,76,97,102], 
Considerable  research  has  been  dedicated  to  detecting  and  extracting  bursts  from 
communication  signals  [35,56,58].  The  signals  considered  for  this  research  effort  are 
URE  signals  collected  from  operational  PLC  equipment,  which  have  a  more  contin¬ 
uous  broadcast  model  as  opposed  to  the  burst  broadcast  model  of  communication 
IRE  devices.  Additionally,  the  structure  in  the  TD  and  SD  for  URE  emissions  is  not 
specified  or  engineered  to  be  collected  and  processed  and  are  significantly  different  in 
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Figure  3.12:  Representative  TD  collected  sequences  from  a  PLC  device  operating  un¬ 
der  Norm  and  Anom  conditions  as  indicated.  The  highlighted  ROI  regions  represent 
the  response  of  one  full  LLP  scan. 


both  domains  for  different  semiconductor  devices.  These  attributes  of  URE  signals 
provide  a  unique  challenge  when  extracting  the  ROI  for  use  in  hardware  or  software 
anomaly  detection. 

A  correlation-based  approach  is  used  in  this  research  in  extracting  the  ROI. 
Each  implemented  LLP  begins  with  an  alignment  reference  comprised  of  the  {MOV, 
SQR}  LLP  operation  sequence  previously  used  for  probe  placement.  The  {MOV, 
SQR}  sequence  is  used  to  detect  the  beginning  of  the  ROI,  containing  operation 
attributable  signal  content.  Each  LLP  used  in  this  research  concludes  with  either  a 
M OV  or  SQR  operation,  designating  the  termination  of  the  operation  attributable 
signal  content.  It  is  important  to  make  a  distinction  to  the  use  of  LLP  operations 
to  detect  ROIs  and  PLC  outputs  to  trigger  the  collection.  Because  outputs  values 
are  assigned  after  the  logical  operations  are  performed,  the  physical  trigger  used 
to  initiate  the  collection  of  emissions  is  not  aligned  to  the  operation  attributable 
signal  content.  Additionally,  unpredictable  operations  performed  by  the  PLC  MCU 
preclude  the  use  of  a  static  alignment  method. 
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The  following  steps  are  performed  on  each  collected,  stored,  and  post-collection 
processed  signal  xc[n].  The  signal  resulting  from  collections  against  the  {MOV, 
SQR}  LLP,  XAs[n }  is  referred  to  as  the  alignment  start  reference.  The  alignment 
reference  signal  consists  only  of  signal  content  attributable  to  the  {MOV,  SQR} 
operations,  extracted  from  a  representative  collected  signal  prior  to  the  alignment 
process.  The  signal  resulting  from  collections  against  either  a  {SQR,  AdOV}  LLP 
(for  N0p=5 )  or  {S'Qi?}  LLP  (for  Nqp—10),  xae\p\  is  referred  to  as  the  alignment 
end  reference.  The  alignment  reference  and  conclusion  reference  are  collected  from 
the  same  DLIT  as  the  sequences  that  are  the  source  of  the  ROIs.  The  correlation 
process  is  identical  to  the  correlation  process  (3.3)  used  in  the  probe  alignment  pro¬ 
cess  discussed  in  Sect.  3.4.3.  The  goal,  in  this  case,  is  to  provide  a  means  of  not  only 
extracting  the  ROI  from  the  burst,  but  also  to  ensure  the  operation  attributable  con¬ 
tent  of  the  ROIs  are  not  corrupted  by  non-operation  attributable  signal  content.  In 
addition  to  the  non-attributable  signal  content  exhibited  in  Fig.  3.12,  the  PLC  DLIT 
also  performs  OS  and  system  maintenance  functions  that  may  occur  between  the 
execution  of  the  LLP  operations,  pristine  ROIs  are  those  containing  only  operation 
attributable  signal  content  while  corrupted  ROI  contain  non-attributable  content. 

The  ROIs  are  extracted  and  declared  pristine  or  corrupted  for  a  single  collected 
sequence  xc[n]  and  alignment  reference  sequences  x as [u]  and  xae[o]  according  to 
the  following: 

1.  Consider  collected  sequence  xc[n]={xc  [n;]},  i— 1, 1,  ...,Nc  and  two  alignment 
reference  sequences  denoted  by  XAs[n]={xAs[ni\},  i— 1,2,  ...,Nas,  and 
XAFi[n\—  {xApfnff},  i— 1,2,  ...,Nar,  with  all  based  on  collections  from  the  the 
same  DUT. 

2.  The  xc[n\,  xas[u ],and  xae[Q  sequences  are  all  collected  and  post-collection 
processed  using  identical  methods,  i.e.,  filtering,  down-sampling,  and  sequence 
transformation. 
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3.  Two  alignment  signals  are  used: 

1)  xA1[n]={xAs[ni],  XAs[n2],  xAS[nNASSamp]}  and 

2)  xA2 [n\={xAE[nAEsamp],  xAE[n2-i],  xAE[n i]}. 

4.  The  sequence  Xc[n]  is  divided  into  two  sequences: 

1)  a:Ci[n]={a:c[ni],  xc[n2},  zc[rajvOSomj(]}  and 

2)  XC2[w]  =  {®c[wcSomp],  ®c[w(2-l)],  •••,  ^cM}- 

5.  Cross  correlation  sequence  Cci,A\[k]  is  calculated  using  (3.3)  with  the  xci[n\ 
and  x a i [n]  sequences  as  inputs.  The  value  m.ax.{Cc\,Ai[k])—Cc\,A\[kiMaXl\  and 
index  for  the  maximum  value  iMaxx  are  found  and  stored.  The  value  i start— ^Max! 
represents  the  estimated  sample  number  for  the  ROI  start. 

6.  Cross  correlation  sequence  Cc2,Ai[k]  is  calculated  using  (3.3)  with  the  Xc2 M 
and  Xa2 N  sequences  as  inputs.  The  value  m.ax(Cc2,A2[k\)—Cc2,A2[kiMax2 ]  anci 
index  for  the  maximum  value  %Max 2  are  found  and  stored.  The  value  iMax2 
represents  the  estimated  number  of  samples  from  the  end  of  signal  xc[n]  to  the 
end  of  the  ROI.  The  estimated  end  of  the  ROI  is  iend=NcSamp  —  Htax 2  samples 
from  the  beginning  of  the  signal  xc[n\. 

7.  For  each  signal  xc[n]  three  criteria  are  used  to  select  the  ROIs: 

1)  Maximum  correlation  value  for  ROI  start  Cms — max ( Cc i  Mk\) 

2)  Maximum  correlation  value  for  ROI  end  CME=*a&x(Cc2,A2[k]) 

3)  Estimated  length  in  samples  of  the  ROI  NnoiEst=iend  —  istart- 

The  preceding  steps  are  repeated  for  all  Nb  potential  sequences 
r’xc [n]  =  {xCl [n] ,  xc2[n],  ...,  xc1  [n] }  to  generate  sets  of  criteria  values  associated  with 
the  signals.  Potential  sequences  are  those  that  are  still  considered  as  candidates  for 
contributing  a  non-corrupted  ROI.  Let  CMs M = { CMs [n i ] ,  CMs ['^2] ,  •••,  CMs[^nb]} 
be  the  set  of  maximum  correlation  values  for  the  ROI  start  such  that  Cms [’Ok]  is 
the  maximum  correlation  start  value  for  the  collected  sequence  xck[n],  Similarly, 
let  Cm e [0] = { Cm e [o  1  ] ,  Cme^],  ■■■,  C m e[o nb]} ?  be  the  set  of  maximum  correlation 
values  for  the  ROI  end  generated  from  the  Nb  collected  sequences  and 
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N RoiEst[n\={NROiEst[n i],  NnoiEstfa],  •  ••,  NROiEst[nNB ]}  be  the  set  of  estimated 
ROI  sample  length  for  the  collected  sequences.  Once  the  criteria  have  been  calculated 
for  all  signals  considered,  the  following  steps  are  used  to  assign  a  rank  to  the  signals 
so  they  can  be  sorted  in  order  of  priority:  from  those  most  pristine  to  those  most 
corrupted.  The  criteria  are  used  to  remove  sequences  from  the  list  Pxc\p\  based  on 
evaluation  of  the  criteria  for  the  sequences. 

The  first  step  in  extracting  sequences  that  are  not  corrupted  by  non-attributable 
content  involves  removing  sequences  based  on  estimated  ROI  length  NRoiEst.[n}. 
Initially,  all  collected  sequences  are  considered  equal  and  kept  in  a  set  of  poten¬ 
tial  sequences  Pxc[n]—{xcAn],  xc2[n\i  •••>  xCnb  [n\ } •  For  each  collected  sequence 
xck  [n] ,  1  <  k  <  Nb  the  estimated  ROI  length  is  compared  to  an  established  thresh¬ 
old.  It  is  assumed  that  the  ROIs  from  sequences  corrupted  by  the  non-operation 
attributable  content  will  be  longer,  in  samples,  than  those  that  do  not  contain  the 
extra  content  and  sequences  with  an  estimated  ROI  length  exceeding  the  established 
threshold  are  removed  from  the  potential  sequence  list  Pxc[n]-  The  threshold  is  es¬ 
tablished  based  on  the  mean  and  standard  deviation  values  for  the  estimated  ROI 
length.  Consider  the  mean  pnoiEst  and  standard  deviation  o-ROIEst  values  for  the 
estimated  ROI  sample  length  set  NROiEst[n]-  A  sequence  xck[n]  is  removed  from  the 
set  of  potential  sequences  Pxc[n]  if  the  estimated  ROI  length  exceeds  the  threshold 
NROiEst[nk\>P-RoiEst  +  0.3 URoiEst-  This  threshold  was  empirically  chosen  to  offer 
an  acceptable  balance  of  removing  sequences  with  potentially  corrupted  ROIs  and 
keeping  an  adequate  number  of  sequences  for  evaluation  of  the  CBAD  process. 

Following  removal  of  sequences  that  exceed  the  threshold 
Nroi Est[n\> Uroi Est.  +  0.3aRoiEst,  the  remaining  sequences  in  the  list  are  assigned  a 
rank  rx[n]  based  on  the  maximum  start  and  end  correlation  values  Cms  and  Cme- 
A  sequence,  xck[n ]  in  the  list  P\c  is  assigned  rank  rx[nk]—fJ>MS,ME[nk]-  The  value 
HMS,ME[nk\  is  the  mean  of  the  2-element  set  {Cms [?hfc],  CmeK]},  the  maximum  cor¬ 
relation  values  for  the  estimated  start  and  end  of  the  ROI  for  sequence  xck[n\.  The 
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remaining  sequences  in  the  potential  sequence  set  Pxc[n\  are  restructured  in  descend¬ 
ing  order  such  that  Pxc[n  1]  is  the  sequence  with  the  maximum  value  for  /U(ms,me)  ['«-]• 
Sequences  considered  for  evaluation  are  taken  from  the  sorted  set  Pxc[n. I-  For  a  de¬ 
sired  number  of  selected  sequences  Nsei,  the  set  of  ordered  sequences 
Psei[n\={Pxc[n  1],  Pxc[n 2],  •••,  Pxc[nNSei\,}  is  used  for  evaluation  of  the  process. 

Once  the  final  list  of  Nsei  selected  sequences 
Psei[n]={Psei\n  1])  Psei[n2\,  •••,  Psei [n NSet ] }  has  been  established,  the  ROIs  must  be 
extracted.  The  ROIs  are  extracted  based  on  the  sample  index  i start  where  the  max¬ 
imum  start  correlation  value  was  found.  The  ROI  length  N^oi  is  established  based 
on  empirically  observed  ROI  length  for  a  representative  sequence.  For  a  sequence 
xc[n]=Psei[nk]—{xc[n  1])  xc[n 2],  ...,  xc[nxs]}  with  a  maximum  start  correlation  in¬ 
dex  iMax  the  ROI  xROi[n}={xc[niMa J,  xc[niMax+1],  ...,  xc[niMax+NROI- 1]}- 

3.8  CBAD  Processing 

The  CBAD  overview  was  presented  in  Sect.  3.3  and  more  details  are  pro¬ 
vided  here  on  key  processing  steps.  The  first  step  involved  collecting  RF  emission 
sequences  from  each  DUT  operating  under  Norm ,  Anom  #1,  and  Anom  ^2  op¬ 
erating  conditions.  For  CBAD  evaluation  under  this  research,  the  collections  were 
performed  for  No— 10  PLC  devices  executing  both  the  Nop= 5  (Sect.  3.2.1)  and 
Nqp=10  (Sect.  3.2.1)  LLPs;  the  actual  number  of  LLP  operations  is  not  significant 
for  the  remaining  discussion  in  this  section.  The  sequences  are  collected  and  stored 
as  outlined  in  Sect.  3.4  and  post-collection  processed  as  outlined  in  Sect.  3.5. 

The  CBAD  process  is  presented  once,  but  is  repeated  for  each  device  and  at 
each  desired  SNR  independently.  The  Normal  reference  sequence  is  only  generated 
once  and  is  not  scaled  for  different  SNR  values. 

3.8.1  Testing  and  Training  Set  Generation.  The  ROIs  are  extracted  from 
the  collected  and  processed  bursts  as  outlined  in  Sect.  3.7.  The  number  of  ROIs 


selected  varied  from  NB=l  collected  ROI  selected  for  an  initial  proof  of  concept  to 
iVs=1000  ROIs  selected  for  the  final  results.  The  number  of  collected  sequence  ROIs 
is  not  significant  to  the  discussion  of  the  process.  The  ROIs  are  separated  into  two 
independent  “Training”  (. XTng\p\ )  and  “Testing”  (xTst[n\)  data  sets;  the  “Training” 
and  “Testing”  distinction  adopted  here  for  consistency  with  terminology  used  in  the 
pattern  recognition  community  [22],  The  training  sequences  were  selected  based 
on  an  interleaved  pattern.  Assume  a  total  of  A"s=1000  ROIs  are  in  the  set  xc[n] 
and  %Tng— 5%  are  selected  as  training  sequences;  a  total  of  NTng= 50  are  used  as 
training  sequences.  In  any  situation  where  the  value  for  %Tng  and  Nb  do  not  result 
in  an  integer  number,  the  number  is  rounded  down  to  the  nearest  integer.  Using  an 
interleaved  selection  pattern,  the  training  set  XTng[n]  is  constructed  from  sequences 
in  xc[n\  by  taking  every  other  sequence  (e.g.,  the  odd  numbered  ones)  out  of  xc[n]. 
The  remaining  NTst—NB-NTng  sequences  in  Xc[n\  (e.g.,  the  even  numbered  ones)  are 
placed  in  the  testing  set  iy.si  [n] . 

While  the  CBAD  process  is  trained  only  on  the  Normal  sequences,  the  train¬ 
ing  and  testing  selection  process  is  performed  on  the  Normal  and  Anomalous  sets 
such  that  all  testing  sets  have  the  same  number  of  sequences.  For  clarity,  let  the 
normal  test  set  be  Xr.stN^] ,  the  anomalous  conditional  test  set  be  xrstA i  [u]  and  the 
anomalous  condition^  test  set  be  XrstA2  [n] .  The  normal  condition  training  set  is 
XTng[n]  since  there  are  no  training  sets  for  the  anomalous  condition. 

For  each  desired  SNR ,  the  Nrst  Norm,  Anom  #1  and  Anom  #2  testing  se¬ 
quences  are  each  added  to  N^r  AWGN  realizations  for  a  total  of 
NTestRiz=NTst  x  N^r  sequences  used  as  testing  sequences.  For  the  purpose  of  this 
CBAD  process  discussion,  the  focus  is  on  a  single  SNR,  device,  operating  condition 
permutation.  The  steps  in  the  CBAD  process  are  implemented  identically  irrespec¬ 
tive  of  what  input  sequences  are  used. 
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3.8.2  Reference  Sequence  xp\n]  Generation.  The  next  step  is  to  generate 
the  Normal  Reference  sequence  xn[n]  in  Fig.  3.3  using  the  NTng  normal  sequences 
contained  in  the  “Training”  data  set  Xrng [n] .  Recall,  the  training  set  is  composed 
entirely  of  sequences  for  the  normal  condition  LLP  Norm.  As  functionally  denoted 
in  (3.17),  the  CBAD  process  accepts  two  inputs,  a  Reference  sequence  A p [n]  and 
an  unknown  Collected  sequence  xc[n],  and  outputs  a  single  real- valued  output  test 
statistic  (zy)  or 

zy  —  CBAD  (xN[n\,xc[n\)  .  (3.17) 


The  CBAD  function  is  first  used  with  input  “Training”  sequences  xPng  [n]  to 
generate  the  desired  Normal  operating  sequence  xj v[n].  After  setting  the  reference 
x f{[n\—x n[o\  as  illustrated  in  Fig.  3.3,  the  CBAD  function  is  then  used  to  generate 
the  collection  of  “Testing”  verification  test  statistics  zy.  The  reference  sequence 
x n [n]  —Xn  [n]  sequence  is  generated  as  follows: 


1.  Construct  a  set  of  NPot=NTng  +  1  potential  reference  sequences  xpot[n]  con¬ 
sisting  of  the  NTng  “Training”  sequences  ]  and  the  sequence  XPot[n]  cal¬ 

culated  as  an  average  of  NPng  sequences.  The  hnal  normal  reference  sequence 
x n [n]  is  selected  from  the  set  of  potential  sequences  xPot[n]. 


- 


NPot~  1 

Z)  CBAD  (xN  [n\  =  xPoti  [n\ ,  xc  [n]  =  xPotj  [n] ) 

j=f _ 

Npot  ~  1 


(3.18) 


:  i  =  1,  2, ...,  NPot]j  =  1,2,...,  NPot]i  ±  j  . 


2.  Consider  the  set  of  average  statistic  values  resulting  from  the  process  in  Step  2 
Zy[n].  The  selected  reference  sequence  is  the  potential  reference  sequence  that, 
when  used  as  a  reference  sequence,  yields  the  minimum  average  verification 
statistic 


xN[n\  =  xpotfn]  3  zv[ni\  =  min  (zv [rq],  zy[n2\,  •••,  zv[nNp J) .  (3.19) 
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Once  the  Norm  reference  is  selected,  it  is  used  as  the  reference  sequence  for 
the  remainder  of  CBAD  processing. 

3.8.3  Test  Statistic  zy  Generation.  The  cross-correlation  sequence  C'Nc[k] 
is  generated  for  each  test  sequence  using  the  selected  xr[u\  reference  sequence  and 
test  Sequence  XTSt[n]  to  be  verified.  This  part  of  the  process  is  completed  for  every 
sequence  in  the  test  sequence  set,  but  is  presented  for  a  single  sequence  to  clearly  out¬ 
line  the  process.  The  resultant  Cigc[k]  is  then  subtracted  from  the  auto-correlation 
sequence  Cjy at[/c]  to  generate  the  correlation  difference  sequence  as  C A[k\—C n jv[&]- 

For  a  reference  sequence  Xr[u]  and  test  sequence  XTst.[n }  of  equal  size  Ns,  the 
correlation  difference  sequence  C'a  [k]  consists  of  NcarrSamp— 2NS^1  samples.  In  order 
to  support  the  binary  decision  of  declaring  the  sequence  anomalous  or  normal  the 
correlation  difference  sequence  C&[k]  is  used  to  generate  a  single  statistic  value  zy. 
The  verification  test  statistic  is  calculated  using  a  pre-selected  difference  function 
(/a)  and  C/s[k]  as  zy— jfi{C \[k]) .  For  all  results  presented  here,  the  difference 
function  is  implemented  as  /a— |Ca[&]|)  i.e.,  a  simple  2-norm  magnitude  operation. 

Once  the  CBAD  statistics  have  been  generated,  the  input  sequences  Xr,st  M  , 
xTng[n],  and  xr[u\  are  no  longer  used. 

3.8.4  Verification  Threshold  Determination.  The  next  step  in  the  CBAD 
process  is  to  establish  the  desired  verification  threshold  ty.  There  are  three  CBAD 
statistic  sets  resulting  from  the  previous  step  in  the  CBAD  process:  1)  the  statistic 
set  for  the  Norm  operating  condition  zyjv[n],  2)  the  statistic  set  for  the  Anom  #1 
operating  condition  zyAi[n],  and  3)  the  statistic  set  for  the  Anom  #2  operating 
condition  zy&2 [n]  •  Recall,  each  set  is  the  same  size  and  contains  NTst  statistics. 

The  interleaved  selection  of  testing  and  training  sequence  sets  was  repeated  for 
the  statistic  set  zy^  [n] .  The  threshold  value  ty  was  established  using  the  collection 
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of  training  verification  test  statistics  zvTng[n\  and  corresponding  Probability  Mass 
Function  (PMF)  Pzv(zvTng)-  For  a  desired  FADRd  performance  the  threshold  ty 
was  set  such  that  the  following  is  satisfied, 


P  [ Zpmg  >  ty]  ~  F ADRD  , 


(3.20) 


where  Zppng  is  the  random  variable  with  a  distribution  defined  by  the  observed  PMF 
Pzv(zvTng)  for  the  set  of  test  statistics  zvTng\p\- 

3.8.5  Anomalous  vs.  Normal  Declaration.  Declaring  input  sequence  xpstV1] 
as  being  Norm  or  Anom  was  based  on  test  statistics  zyTst  derived  from  xpst [u]  ■  The 
final  declaration  is  made  using  a  simple  comparison  of  input  test  statistic  ZyTst  with 
the  established  Verification  Threshold  ty  according  to 


ZyTst  <  ty  — >■  XTst[n]  :  Normal 
ZyTst  >  ty  — >■  XTst[n]  :  Amomalous  . 


(3.21) 


3.9  LLP  Operation-by- Operation  Processing 

The  CBAD  process  in  Sect.  3.8  operates  on  entire  input  sequences  and  cal¬ 
culates  a  single  CBAD  statistic  for  the  entire  waveform.  An  alternate  method  for 
calculating  CBAD  statistics  is  to  use  multiple  reference  sequences  consisting  of  stored 
sequences  for  each  LLP  operation  in  the  Norm  operating  condition.  The  anomalous 
LLPs  used  to  generate  the  collected  emissions  differ  in  either  Nqp—1  ( Anom  ff2) 
or  Nop=2  operations  ( Anom  ffl)  from  the  Norm  operating  condition  LLP  Norm. 
The  length,  in  samples,  for  the  altered  LLP  operation  dictates  the  number  of  sam¬ 
ples  that  are  different  in  the  anomalous  emissions.  This  step  in  the  research  effort 
focused  on  leveraging  knowledge  of  the  normal  operating  sequence  using  Nop—10 
unique  reference  signals  to  evaluate  each  LLP  operating  region  for  anomalous  (dif¬ 
ferent  form  the  normal)  behavior.  Figure  3.13  shows  the  |iL[x[n]]|  emission  sequence 
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with  the  Nqp—10  operations  clearly  depicted.  The  change  in  emissions  due  to  a 


Figure  3.13:  Emission  sequences  with  Nop= 10  LLP  operations  being  clearly  at¬ 
tributable  to  specific  subregions.  Operations  highlighted  in  red  represent  changes 
that  were  made  to  the  Norm  LLP  to  simulate  anomalous  operating  conditions  in 
the  Anorn  #1  and  Anom  ^2  LLPs. 

single  altered  operation  may  not  be  enough  to  surpass  the  detection  threshold  tv. 
Therefore,  an  Operation-by-Operation  CBAD  process  is  employed  where  each  oper¬ 
ation  is  weighted  equally  when  making  the  decision  to  declare  anomalous  or  normal 
regardless  of  the  actual  ratio  of  total  samples  the  operation-attributable  signal  oc¬ 
cupies. 

The  operation-by-operation  implementation  of  the  CBAD  detection  process 
computes  multiple  CBAD  statistics  arranged  in  a  sequence  or  CBAD  statistics  vector 
<  Zy i,  Zy2,  ...,  ZyN  >  where  N0p  is  the  number  of  LLP  operations  in  the  normal 
operating  condition  program.  The  CBAD  statistics  are  calculated  for  each  of  the 
Norm  LLP  operation  regions  seen  Fig.  3.13.  Each  delineated  operation  region  has 
a  reference  emission  used  to  calculate  the  CBAD  statistic  for  that  operation  region. 
The  Nop  Norm  operations  clearly  align  with  the  operation- by-operation  regions 
while  the  Anom  #1  and  Anom  operations  do  not. 
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Figure  3.14  illustrates  the  flow  of  the  Operation-by-Operation  CBAD  process 
showing  parallel  CBAD  statistic  calculations  used  to  generate  the  Nqp  CBAD  statis¬ 
tics  vector.  The  function  fz(- )  used  to  reduce  the  CBAD  statistics  vector  <  zy1,  zy2, 
...,  zyN  >  is  the  2-norm  magnitude  function  |  <  Zy1 ,  Zy2,  ...,  zvNqp  >  |-  The  end 
result  is  still  a  single  CBAD  statistic  used  to  declare  the  operating  condition  normal 
or  anomalous  based  on  a  threshold  tv. 


Figure  3.14:  Parallel  CBAD  processing  used  to  perform  LLP  operation-by-operation 
correlation.  The  branch  test  statistics  (zf)  are  used  to  form  a  composite  CBAD  test 
statistic  vector  for  final  verification  assessment,  with  a  2-norm  magnitude  used  to 
make  the  final  Norm  or  Anom  declaration. 


3.10  Performance  Evaluation 

Verification  performance  was  evaluated  for  this  research  using  1)  True  Anomaly 
Detection  Rate  (TADR)  vs.  SNR  performance  curves,  and  2)  traditional  Receiver 
Operating  Characteristic  (ROC)  curves  generated  by  plotting  False  Anomaly  Detec¬ 
tion  Rate  (FADR)  vs.  TADR  based  on  discrete  PMFs  formed  using  selected  test 
statistics. 

3.10.1  Performance  Curves.  The  TADR  vs.  SNR  performance  curve  is 
generated  by  plotting  the  TADR  for  each  SNR  considered.  Before  the  TADR  values 
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can  be  calculated,  a  threshold  ty  must  be  established  to  determine  what  statistic 
values  result  in  an  anomalous  declaration  and  what  statistic  values  result  in  a  normal 
declaration.  Each  SNR  value  considered  has  a  unique  threshold  value  ty  calculated 
for  use  with  sequences  at  that  SNR.  The  threshold  ty  is  calculated  at  each  SNR  value 
considered  to  provide  F AD R=10 .0%.  The  arbitrary  Benchmark  TADRb= 90.0%  is 
used  to  determine  the  SNR  value  used  for  ROC  curve  generation. 

3.10.2  CBAD  Statistical  PMFs.  The  experimental  PMF  derived  from  the 
calculated  CBAD  statistics  is  used  to  generate  the  ROC  curves.  The  PMFs  are 
experimentally  determined  and  are  generated  in  keeping  with  accepted  random  pro¬ 
cess  and  signals  methods  [51].  It  is  also  used  to  provide  a  qualitative  measure  of 
separation  between  CBAD  statistic  values  associated  with  the  different  operating 
conditions  or  hardware  devices.  The  PMFs  are  experimentally  generated  and  de¬ 
pend  on  a  selection  of  a  specific  SNR  value.  The  selected  SNR  value  is  the  lowest 
valued  SNR  that  satisfies  the  benchmark  TADRb= 90.0%  as  specified  in  the  previous 
section. 

3.10.3  ROC  Curve  Assessment.  The  ROC  curves  are  generated  in  keeping 
with  accepted  biometric  standards  and  methods  [48].  The  ROC  curve  is  generated 
using  the  results  of  the  experimental  PMF  calculations.  A  ROC  curve  consists  of 
data  plotted  for  FADR  vs.  TADR  and  provides  a  means  of  comparing  detectors 
based  on  the  Equal  Error  Rate  (EER).  The  EER  is  the  point  at  which  the  two  errors 
associated  with  verification  (FADR  and  False  Normal  Verification  Rate  (FNVR)  are 
equal  in  keeping  with  accepted  biometric  verification  standards  and  practices.  [48]. 
The  arbitrary  benchmark  for  the  EER  is  EERb= 10.0%.  This  goal  is  determined  by 
the  relationship  of  the  FNVR  and  the  performance  benchmark  of  TADRb= 90.0%: 
FNVR=  1  -  TADR. 

ROC  curves  are  generated  by  varying  the  threshold  ty  and  calculating  the 
FADR  and  TADR  values  for  each  variation  in  threshold.  Consider  a  set  of  normal 
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CBAD  statistics  zn[ti]  with  Nzn  elements  and  a  set  of  anomalous  CBAD  statistics 
za\p\  with  NZa  elements.  Let  ty[n]  be  the  set  of  Nv  threshold  values  arranged  in 
ascending  order  such  that  ty[ni\<ty[nj],  l<i<j<Nv,  i^j.  Considering  the  union  of 
Norm  zn [n]  and  Anom  za[ti }  CBAD  statistics, 


zu  =  {zN[ni\,  zN[n2\, zN[nNzN\,  zA[ni],  zA[n2}, zA[nNzN}}  ,  (3.22) 

Let  tyR[n]  be  the  set  of  threshold  values  used  to  generate  the  ROC  curve.  The 
number  of  threshold  values  in  the  set  ty[n]  is  dictated  by  the  desired  ROC  resolution. 
For  this  research,  a  total  of  iVy= 100  threshold  values  were  sufficient  for  ROC  curve 
analysis.  The  set  of  7Vy=100  threshold  values  ty[n\  used  to  generate  the  ROC  curve 
is  based  on  the  values  in  zy  where  ty[n\]  is  set  equal  to  the  minimum  value  of  zy, 

tv[n i]  =  min{z[/}  ,  (3.23) 

and  the  remaining  elements  are  defined  by 

M»,i = Omax{A}  Alinfa}) 

Ny  -  1  (3  24) 

:  i  =  2,  3, . . . ,  Ny  . 


3.11  GRLVQI  Processing 

A  majority  of  the  initial  research  activity  focused  on  software  anomaly  detec- 
hmn-discriminating  between  various  operating  conditions  to  detect  malfunctioning 
or  malicious  software,  firmware,  etc.  However,  an  important  parallel  avenue  of  re¬ 
search  developed  to  support  hardware  device  discrimination- discriminating  between 
various  hardware  components  to  detect  malfunctioning  or  counterfeit,  trojan,  etc., 
Integrated  Circuits  (IC). 

It  was  determined  that  the  proposed  verification-based  anomaly  detection  pro¬ 
cess  was  well-suited  for  the  hardware  device  discrimination  task  and  initial  proof-of- 
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concept  demonstration  was  conducted  using  the  GRLVQI  process  developed  in  [76]; 
the  process  was  not  modified  under  this  effort  so  the  minimal  details  are  presented 
here.  The  GRLVQI  process  is  inherently  signal  agnostic  and  can  accept  any  type  of 
sequence  as  input.  For  demonstrations  here,  two  specific  types  of  input  sequences 
were  considered:  1)  TD  feature  sequences  and  2)  Correlation  Domain  (CD)  feature 
sequences.  The  TD  feature  sequences  /tdM  were  generated  using  the  RF-DNA 
process  in  Sect.  3.6.2  with  Nr— 12  subregions  plus  the  total  response,  all  three  in¬ 
stantaneous  features,  and  all  four  statistics,  for  a  total  of  I\V=156  features  in  each 
Composite  Fingerprint  sequence. 

To  demonstrate  hardware  device  discrimination ,  a  single  LLP  was  used  to  gen¬ 
erate  sequences  for  multiple  PLC  devices  with  goal  of  maintaining  constant  operating 
conditions  to  ensure  discrimination  was  based  on  device  hardware.  The  Nop= 10  LLP 
for  Normal  operating  conditions  was  used  to  generate  the  RF-DNA  /tdH  feature 
sequence  for  use  in  the  hardware  discrimination  portion  of  the  research. 

In  addition  to  TD  feature  sequences,  the  GRLVQI  method  of  verification 
was  evaluated  using  CD  feature  sequences  fcD[n]  that  were  generated  using  the 
Operation-by-Operation  CBAD  process  described  in  Sect.  3.9.  Instead  of  creating  a 
single  CBAD  statistic  zy,  a  collection  of  CBAD  statistics  {ziv,  z2v ,  •••,  Avion}  were 
generated  using  NOp=10  LLPs.  These  CBAD  statistic  sequences 
fcD[n]—{ziv,  z2v,  •  ••,  Av10n}  were  used  an  input  sequences  for  GRLVQI  verification 
performance  assessment. 

Performance  of  the  GRLVQI  process  was  evaluated  for  both  TD  /td[u]  and 
CD  fc n [n]  feature  sequences  using  ROC  curves  and  benchmark  performance  criteria 
presented  in  Sect.  3.10.3. 
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4-  Results 

This  chapter  provides  research  results  for  software  anomaly  detection  and  hardware 
component  discrimination  based  on  the  methodology  presented  in  Chapter  3.  Sec¬ 
tion  4.1  first  introduces  the  various  Programmable  Logic  Controller  (PLC)  response 
sequences  used  for  generating  results  and  assessing  performance.  Results  for  soft¬ 
ware  anomaly  detection  via  verification  using  the  Correlation  Based  Anomaly  De¬ 
tection  (CBAD)  process  are  presented  in  Section  4.3  for  Time  Domain  (TD)  PLC 
input  sequences,  Section  4.4  for  statistical  Radio  Frequency  Distinct  Native  At¬ 
tributes  (RF-DNA)  input  sequences,  and  Section  4.5  for  Hilbert  transformed  input 
sequences.  The  chapter  concludes  with  Section  4.6  which  demonstrates  hardware 
component  discrimination  via  verification  using  a  Generalized  Relevance  Learning 
Vector  Quantization-Improved  (GRLVQI)  classification  process  with  both  TD  and 
Correlation  Domain  (CD)  statistical  features  as  inputs. 

4-1  PLC  Response  Sequences 

The  experimental  methodology  described  in  Chapter  3  is  first  used  to  demon¬ 
strate  applicability  of  the  CBAD  process  for  reliably  detecting  anomalous  software 
operating  conditions.  This  is  done  using  the  specific  ladder  logic  programs  described 
in  Section  3.2  and  collected  signals  described  in  Section  3.7. 

Results  for  this  research  are  based  on  unintentional  TD  emissions  collected 
from  Allen  Bradley  SLC-500  PLC  Central  Processing  Unit  (CPU)  modules.  These 
emissions  are  sampled,  stored,  and  post-collection  processed  using  the  methodology 
and  configurations  specified  in  Chapter  3.  The  emissions  are  collected  from  selected 
PLC  devices  executing  Nop= 5  and  Nop= 10  Ladder  Logic  Program  (LLP)  opera¬ 
tions.  The  term  burst  is  introduced  as  a  general  term  to  refer  to  a  collected,  sampled, 
and  post-collection  processed  emission.  Specific  details  of  the  LLPs  used  to  generate 
PLC  emissions  are  provided  in  Section  3.2. 


There  are  four  types  of  PLC  response  sequences  generated  from  the  experimen¬ 
tally  collected  TD  emissions  and  each  was  used  in  some  manner  to  evaluate  CBAD 
and/or  GRLVQ1  processes.  The  four  sequences  included:  1)  the  TD  magnitude 
response  sequences  |x[n]|,  2)  the  magnitude  of  Hilbert  transformed  TD  response  se¬ 
quence  |if[x[n]]|,  3)  the  statistical  RF-DNA  TD  response  sequence  fro [n]  and  4)  the 
CD  response  sequence  fc n  [n]  ■  All  four  types  of  sequences  served  as  input  sequences 
for  evaluation  and  were  derived  from  PLC  bursts  as  described  in  Chapter  3. 

4-2  Performance  Evaluation  Criteria 

Three  evaluation  criteria  were  used  to  assess  software  anomaly  detection  perfor¬ 
mance  relative  to  an  arbitrary  Benchmark  (B)  defined  by  1)  Signal- to-Noise  Ratio 
(. SNRb ),  2)  True  Anomaly  Detection  Rate  ( TADRB ),  and  3)  Equal  Error  Rate 
(. EERb )•  The  following  steps  were  used  to  derive  resultant  performance  metrics 
relative  to  the  established  benchmark: 

1.  Generate  verification  results  for  varying  SNR  using  a  given  anomaly  detection 
method  and  input  sequence  type  pairing  and  plot  SNR  vs.  TADR. 

2.  Determine  the  lowest  SNR  at  which  the  plotted  TADR=TADRB.  An  ar¬ 
bitrary  TADRb> 90.0%  benchmark  was  chosen  here  for  assessment.  The  cor¬ 
responding  SNRb  at  which  TADR=TADRB  is  used  for  Receiver  Operating 
Characteristic  (ROC)  curve  generation. 

3.  Generate  a  ROC  curve  by  plotting  False  Anomaly  Detection  Rate  (F ADR) 
vs.  TADR  at  SNRb ■  The  corresponding  Equal  Error  Rate  ( EER )  point  is  de¬ 
termined  as  the  point  on  the  ROC  curve  at  which  F AD R=F NV R—l—T AD R 
where  F ADR  is  the  False  Anomaly  Detection  Rate.  An  arbitrary  benchmark 
of  EERb<  10.0%  was  chosen  here  for  assessment. 

Performance:  A  given  anomaly  detection  method  and  input  se¬ 
quence  pair  is  deemed  inadequate  if  it  does  not  achieve  the  arbitrary 
TADRb> 90.0%  or  EERB<  10.0%  benchmarks. 
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4-3  Software  Anomaly  Detection:  TD  Sequences 


The  first  evaluations  of  the  CBAD  process  were  conducted  using  TD  magnitude 
sequences  |x[n]|  derived  from  sampled,  post-collection  processed  PLC  emissions  as 
described  in  Section  3.5;  the  input  TD  sequences  are  simply  transformed  by  taking 
the  magnitude  of  each  sample  (|x[n]|),  with  no  Hilbert  transform  applied.  Two 
methods  were  initially  used  to  evaluate  the  plausibility  of  using  CBAD  processing  to 
detect  changes  in  the  PLC  operating  condition:  1)  NB—1  TD  magnitude  sequence 
|x[n]|  combined  with  Ajvr=200  Additive  White  Gaussian  Noise  (AWGN)  realizations 
to  achieve  the  desired  Analysis  Signal-to-Noise  Ratio  ( SNRa ),  and  2)  NB=60  TD 
magnitude  sequences  |x[n]|  combined  with  Niyr=10  AWGN  realizations  to  achieve 
the  desired  analysis  SNR a-  For  both  methods  only  one  PLC  device  was  used  to 
generate  |x[n]|  input  sequences. 


Notation:  Unless  noted  otherwise,  SNR  is  used  exclusively  to  repre¬ 
sent  SNRa  throughout  the  remainder  of  the  document. 


Presentation:  Subsequent  use  of  NNr  notation  refers  to  the  total 
number  of  independent,  randomly  generated  AWGN  noise  realizations 
{xm[n],xB2[n],  ...xBNr[n]}  used  to  power-scale  selected  sequences  to 
evaluate  performance  at  the  desired  SNR. 


4-3.1  Single  Device,  NB= 1,  N0p=  5.  The  anomaly  detection  process  was 
initially  assessed  using  a  single  (NB— 1),  representative  PLC  TD  magnitude  sequence 
|x[n]|  from  the  PLC  operating  under  Norm,  Anom  #1,  and  Anom  ff 1  conditions 
using  Nqp=5  operations.  For  the  single  response  detection,  the  same  burst  with 
varying  analysis  SNR  was  used  for  both  training  and  testing.  This  was  done  to 
demonstrate  the  impact  of  SNR  variation  and  noise  degradation  on  CBAD  per¬ 
formance  without  the  effects  of  input  signal  variation  being  present.  The  initial 
demonstration  was  performed  on  a  single  PLC  device  denoted  as  WQ.  The  anomaly 


70 


detection  process  was  repeated  for  SNR&[— 30.0, 30.0]  dB  using  Abvr=200  AWGN 
noise  realizations  per  SNR.  This  yielded  a  total  of  iVz=200  independent  CBAD 
verification  statistics  (zy)  for  each  operating  condition  at  each  SNR  considered. 

Figure  4.1  shows  anomaly  detection  SNR  vs.  T ADR  performance  for 
SNRe[— 30.0,  30.0]  dB.  As  indicated,  the  TADRB> 90.0%  benchmark  is  achieved 
for  SNR>— 10.0  dB.  Based  on  these  results,  anomaly  detection  ROC  performance 
was  evaluated  at  SNR=— 10.0  dB  using  TD  magnitude  sequence  |x[n]|  and  the  same 
conditions  as  used  for  Fig.  4.1  results;  the  PLC  operating  under  Norm,  Anom  #1, 
and  Anom  #1  conditions  using  Nop=5  LLP  operations.  ROC  performance  results 
are  presented  in  Fig.  4.2  and  reflect  EER< 3.2%  which  meets  the  EERB<  10.0% 
benchmark. 
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Figure  4.1:  SNR  vs.  TADR  performance  using  TD  magnitude  sequence  |x[n]|  for 
Nb=1  burst  with  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  #1  con¬ 
ditions  using  Nqp=5  LLP  operations.  The  TADRb>90.0%  benchmark  is  achieved 
for  SNR>-10.0  dB. 


4-3.2  Single  Device,  NB=60,  N0p=5.  The  anomaly  detection  process  per¬ 
formed  in  Section  4.3.1  was  repeated  using  NB=60  TD  sequences  per  operating 
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Figure  4.2:  Anomaly  detection  ROC  curve  for  the  SNR=— 10.0  dB  operating  point 
in  Fig.  4.1.  Results  generated  using  the  TD  magnitude  sequence  \x[n]\  with  Np= 1 
burst  and  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  #1  conditions 
using  Nqp=5  LLP  operations.  The  EERb<  10.0%  benchmark  is  achieved. 

condition  for  the  Norm,  Anom  #1,  and  Anom  #1  N<jp=5  operating  conditions. 
There  were  Npng— 3  bursts  selected  as  training  burst,  leaving  Npst— 57  bursts  per 
operating  condition  for  CBAD  processing  evaluation.  As  in  the  single  response  case, 
the  multiple  response  process  was  repeated  for  SNRe[— 30.0, 30.0]  dB  and  contains 
representative  bursts  collected  from  the  WQ  device.  For  each  SNR  considered, 
anomaly  detection  was  based  on  Nprr=10  AWGN  noise  realizations  per  SNR.  This 
yielded  a  total  of  AL=570  test  statistics  for  each  operating  condition  at  each  SNR 
considered. 

Figure  4.3  shows  SNR  vs.  TADR  performance  for  SNRe[— 30.0,  30.0]  dB. 
The  T ADRp> 90.0%  benchmark  is  not  achieved  for  any  SNR  considered.  This  is 
due  to  variation  in  the  Np=Q0  TD  waveforms.  Each  burst  represents  a  unique, 
collected  signal  with  content  that,  while  attributable  to  the  operations  in  the  LLP, 
is  not  identical  to  the  content  in  the  other  bursts.  The  CBAD  process  is  designed  to 
detect  variations  from  the  normal  conditions.  The  Npst—57  test  input  sequences  vary 
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enough  from  the  Nxng— 3  training  input  sequences  that  even  the  Norm  sequences 
are  incorrectly  declared  anomalous. 


Figure  4.3:  SNR  vs.  TADR  performance  using  TD  magnitude  sequence  |x[n]|  for 
Nb= 60  bursts  with  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  #1 
conditions  using  Nop= 5  LLP  operations.  The  T ADRb>90.0%  benchmark  is  not 
achieved  for  any  SNR<e[— 30.0,  30.0]  dB. 


Based  on  results  in  Fig.  4.3,  ROC  curve  performance  was  assessed  at  SNR 
=30.0  dB  for  the  multiple  response  TD  waveform  bursts.  Although  TADR  perfor¬ 
mance  did  not  achieve  the  TADRb> 90.0%  benchmark  for  any  SNR  considered, 
SNR=30.0  dB  yielded  the  highest  TADR  performance  and  was  chosen  to  complete 
ROC  analysis.  As  seen  in  Fig.  4.4,  the  EERb<  10.0%  benchmark  is  not  achieved  for 
SNR=30.0  dB. 

Figure  4.5  shows  experimentally  derived  Probability  Mass  Function  (PMF) 
P[Zy=zy]  for  the  pool  of  test  statistics  under  Norm ,  Anom  #1  and  Anom  ^2 
operating  conditions  using  Nb=Q0  TD  bursts  and  N]yr— 10  AWGN  noise  realizations 
scaled  to  achieve  SNR=30.0  dB.  Due  to  variation  in  collected  waveform  responses 
under  specified  operating  conditions,  the  variance  in  zy  here  is  greater  than  what  was 
observed  for  the  Nb=1  case.  As  the  PMF  response  indicate,  the  Norm  condition 
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Figure  4.4:  Anomaly  detection  ROC  curve  for  SNR=30.0  dB  operating  point  in 
Fig.  4.3.  Results  obtained  using  TD  magnitude  sequence  |x[n]|  for  Ng= 60  bursts 
with  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  #1  conditions  using 
Nqp—5  LLP  operations.  The  EERb<10.0%  benchmark  is  not  achieved. 

£y  range  significantly  overlaps  the  range  of  £y  for  both  Anom  ffl  and  Anom 
conditions.  The  2y  ranges  for  normal  and  anomalous  operations  overlap  and  are  not 
completely  separable  for  any  SNR  considered. 


Performance:  The  Untransformed  TD  Sequences  were  insufficient 
for  reliably  detecting  anomalous  operating  conditions  and  the  desired 
benchmark  performance  was  not  achieved  using  multiple  bursts  [87,89]. 


4-4  Software  Anomaly  Detection:  RF-DNA  Sequences 

Failure  of  the  software  anomaly  detection  process  when  using  multiple  collected 
PLC  emissions  motivated  the  need  for  an  alternate  representation  of  anomalous 
and  normal  operating  conditions.  Previous  research  efforts  have  found  success  in 
classification  and  verification  processes  based  on  using  statistical  features  extracted 
from  collected  waveforms  [11,58,79,102],  The  next  step  in  this  research  effort  was 
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Figure  4.5:  Representative  PMFs  for  the  PLC  WQ  device  operating  under  Norm , 
Anom,  #1  and  Anom  ^2  operating  conditions  using  Nb= 60  TD  bursts  and  N]yr=10 
AWGN  noise  realizations  scaled  to  achieve  SNR=30.0  dB.  There  is  significant  over¬ 
lap  between  the  anomalous  PMF  and  normal  PMF  zy]  perfect  anomalous-normal 
separation  and  reliable  verification  is  not  achievable. 

to  consider  using  sequences  formed  from  the  statistical  features  of  the  waveforms  as 
the  input  to  the  anomaly  detection  process. 

As  stated  in  Section  3.3,  the  anomaly  detection  process  is  signal  agnostic  and 
can  operate  on  any  discrete  input  sequence.  For  the  feature-based  anomaly  detection 
process,  the  set  of  input  sequences  {opvM,  xn[n],  xc\n]}  are  a  series  of  values  rep¬ 
resenting  the  statistical  attributes  of  a  given  TD  sequence,  i.e. ,  {/tdjvM,  /rdjvH, 
}cdn\p\} )  respectively.  As  outlined  in  Section  3.6.2,  the  Feature  Extraction  and 
Statistical  Fingerprint  Generation  processes  are  used  to  create  a  Composite  Finger¬ 
print  based  on  the  collected  emission  [11,103].  The  Composite  Fingerprint  reduces 
the  dimensionality  of  the  sequence  used  in  the  anomaly  detection  process.  The  TD 
sequences  considered  for  this  research  are  represented  by  an  Ah?=7500  dimensional 
vector,  where  the  dimensionality  is  a  function  of  the  sampling  rate  fs  and  time  length 
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Twf  of  the  TD  emission  4.1. 


(4.1) 


Nd  —  fs  x  Twf 

The  process  is  graphically  depicted  in  Fig.  3.11  and  produces  an  Nd— 156  di¬ 
mensional  vector  using  Nr— 12  sub  regions  and  the  total  signal,  Nstat= 4  statistical 
attributes  per  region,  and  NFeat=3  signal  attributes  per  region  (4.2).  The  compos¬ 
ite  fingerprint  feature  vector  fF n  |n]  serves  as  the  input  sequence  to  the  anomaly 
detection  process. 

Nd  =  ( Nr  +  1)  X  Nstat  X  Npeat  (4.2) 

4-4-1  Single  Device,  Nr=60,  Nop=5.  The  anomaly  detection  process  in 
Section  4.3.2  was  repeated  using  the  same  Nb= 60  bursts  per  operating  condition 
for  the  Norm ,  Anom  #1,  and  Anom  #1  Nqp= 5  conditions.  There  were  NFng=3 
bursts  per  operating  condition  selected  for  training  and  NFst= 57  bursts  per  operat¬ 
ing  condition  selected  for  testing  to  evaluate  CBAD  processing.  The  multiple  burst 
processing  was  repeated  for  SNRe[— 25.0,  25.0]  dB  using  NNr= 10  AWGN  noise  re¬ 
alizations  per  SNR.  This  yielded  a  total  of  Nz= 570  test  statistics  for  each  set  of 
RF-DNA  feature  vectors  {Jtd\  [u],  frD2[n],  .../td57o ['«-]}  under  each  operating  condi¬ 
tion  at  each  SNR  considered.  The  process  varies  from  the  TD  Waveform  process 
in  that  a  Composite  Fingerprint  fro  is  generated  for  each  of  the  waveforms.  The 
Composite  Fingerprint  is  used  as  the  input  sequence  to  the  anomaly  detector. 

Figure  4.6  shows  the  resultant  SNR  vs.  T ADR  for  SNRe[— 25.0  25.0]  dB.  As 
indicated,  the  TADRB>90.0%  benchmark  is  achieved  for  SNR> 8.2  dB. 

Based  on  performance  in  Fig.  4.6,  ROC  curve  performance  for  the  NB= 60 
case  was  assessed  for  SNR=8.2  dB.  The  resultant  ROC  curve  in  Fig.  4.7  shows 
EER<  10.0%  which  satisfies  the  EERB<  10.0%  benchmark. 

TD  waveform  magnitude  sequences  |x[n]|  are  not  an  effective  input  for  the 
CBAD  process  due  to  variation  between  collected  bursts.  When  evaluating  the  po- 
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Figure  4.6:  SNR  vs.  TADR  performance  using  TD  magnitude  sequence  |x[n]|  for 
Nb= 60  bursts  with  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  #1  con¬ 
ditions  using  Nop= 5  LLP  operations.  The  T ADRb> 90.0%  benchmark  is  achieved 
for  SNR>8.2  dB. 


tential  for  using  the  RF-DNA  features  as  input  sequences,  a  specific  feature  sequence 
Irtd  was  selected  as  the  reference  based  on  observed  TADR  performance.  The  re¬ 
sults  indicate  that  using  the  RF-DNA  features  frn  [n]  as  input  sequences  results 
in  improved  performance  when  compared  to  using  the  TD  waveform  magnitude  se¬ 
quences  as  inputs  when  the  reference  burst  is  specifically  selected  based  on  analysis 
of  the  normal  and  anomalous  bursts.  The  envisioned  approach  is  for  the  training  to 
rely  on  the  known  normal  bursts  only. 

Figure  4.8  shows  TADR  results  for  SNR&[— 30.0,  30.0]  dB  using  the  CBAD 
reference  selection  process  to  automatically  select  the  reference  training  burst  train¬ 
ing  on  observed  normal  conditions  only.  The  T ADRb> 90.0%  benchmark  is  not 
achieved  for  any  SNR&[— 30.0,  30.0]  dB. 

The  resultant  TADR  performance  in  Fig.  4.8  is  poorer  than  the  TADRb> 90.0% 
benchmark  for  all  SNR  considered.  Given  that  SNR=‘ >().()  dB  yielded  the  highest 
TADR ,  it  was  used  to  generate  the  ROC  curve  results  for  the  Nb=60  case  shown  in 
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Figure  4.7:  Anomaly  detection  ROC  curve  for  SNR— 8.2  dB  operating  point  in 
Fig.  4.6.  Results  obtained  using  TD  RF-DNA  feature  sequences  /tdM  for  Nb=60 
bursts  with  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  ffl  conditions 
using  Nqp=5  LLP  operations.  The  EERb<  10.0%  benchmark  is  achieved. 

Fig.  4.9.  The  EERb<  10.0%  benchmark  is  not  achieved  for  any  SNRe[— 30.0, 30.0]  dB. 
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Using  the  feature-based  detection,  the  selection  of  the  reference  burst  substan¬ 
tially  affects  performance.  In  the  envisioned  use-case,  the  reference  would  be  built 
based  on  observed  normal  operation  using  an  automated  process.  A  more  robust 
method  of  detecting  anomalous  behavior  is  required. 

Performance:  The  RF-DNA  Feature  Sequences  were  insufficient  for  re¬ 
liably  detecting  anomalous  operating  conditions  and  the  desired  bench¬ 
mark  performance  was  not  achieved  using  multiple  bursts  [87, 89] . 


4-5  Software  Anomaly  Detection:  Hilbert  Sequences 

The  failure  of  the  anomaly  detection  process  for  multiple  collected  response 
waveforms  using  waveforms  and  the  lack  of  robust  characteristics  when  using  fea¬ 
tures  necessitates  another  means  of  representing  the  anomalous  and  normal  operating 
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Figure  4.8:  SNR  vs.  TADR  performance  using  TD  magnitude  sequence  |x[n]|  for 
Nb= 60  bursts  with  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  #1  con¬ 
ditions,  Nop=5  LLP  operations,  with  training  only  based  on  Norm  input  sequences. 
The  TADRb>90.0%  benchmark  is  not  achieved  for  any  SNRe[— 30.0,  30.0]  dB. 


conditions.  Recall  the  Hilbert  transform  used  in  audio  signal  processing  applications 
to  stabilize  signal’s  amplitude  estimates  [31,71].  The  next  step  in  this  research  effort 
is  to  evaluate  the  feasibility  of  using  the  Hilbert  transform  to  improve  anomaly  detec¬ 
tion  performance.  The  Hilbert  transform  is  performed  as  specified  in  Section  3.6.1  on 
TD  waveform  sequences,  x[n]  to  generate  Hilbert  transformed  magnitude  sequences, 

|  H \x [n] ]  | .  The  input  sequence  for  TD-Based  anomaly  detection  is  the  magnitude 
of  the  collected  real- valued  TD  emission,  |x[n]|.  For  brevity,  the  sequences  are  de¬ 
noted  as  TD  sequences  |x[n]|  to  differentiate  from  corresponding  Hilbert  sequences 
\H[x[n}]\. 


4-5.1  Single  Device,  NB=  60,  N0p=5.  To  evaluate  the  impact  of  using  noise 
degraded  signals  in  the  anomaly  detection  process,  the  CBAD  process  is  performed 
using  Hilbert  transformed  magnitude  input  sequences  |7/[x[n]]|  generated  by  taking 
the  Hilbert  transform  of  TD  waveform  sequences  combined  with  AWGN  sequences 
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Figure  4.9:  Anomaly  detection  ROC  curve  for  SNR=30.0  dB  operating  point  in 
Fig.  4.8.  Results  obtained  using  TD  RF-DNA  feature  sequences  /td[u]  for  Nb=60 
bursts  with  the  PLC  operating  under  Norm ,  Anom  #1,  and  Anom  #1  conditions 
using  Nqp=5  LLP  operations.  The  EERb<  10.0%  benchmark  is  not  achieved. 

to  for  SNR(E  [—30.0,  30.0]  dB  as  in  the  waveform  input  sequence  analysis.  There  are 
a  total  Nb=60  TD  waveforms  with  Npng— 3  bursts  selected  as  training  burst,  leav¬ 
ing  Nxst— 57  bursts  per  operating  condition  for  CBAD  processing  evaluation.  For 
each  SNR  considered,  the  anomaly  detection  process  used  NNr=10  AWGN  realiza¬ 
tions  per  SNR.  This  yielded  a  total  of  Nz=570  Hilbert  sequences  and  associated 
CBAD  test  statistics  for  each  permutation  of  operating  condition,  device,  and  SNR 
considered. 

Figure  4.3  shows  results  for  the  anomaly  detection  process  when  the  TD  se¬ 
quences  are  used  as  inputs.  Using  the  TD  sequences  results  in  an  unacceptable 
anomaly  detection  rate  of  T ADR< 90.0%  for  all  SNR  considered. 

The  anomaly  detection  process  was  repeated  using  the  same  Nb=60  collected 
PLC  emissions  per  operating  condition,  per  device,  under  the  Norm ,  Anom  #1,  and 
Anom  #1  operating  conditions.  The  same  SNR  and  same  AWGN  noise  realizations 
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were  used  for  each  operating  condition  and  each  device  to  generate  the  Hilbert 
magnitude  sequences  {|if[xi[n]]|,  \H[x2 [n] ] ] ,  ...|i/[a;6o[n]]|}. 

Figure  4.10  shows  results  for  the  TADR  at  SNRe[— 30.0,  30.0]  dB  when  the 
Hilbert  sequences  |i/[x[n]]|  are  used  as  inputs.  The  TADRB> 90.0%  benchmark  is 
achieved  for  SNR> 0.0  dB. 
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Figure  4.10:  SNR  vs.  TADR  performance  using  Hilbert  magnitude  sequences 
|i/[x[n]]|  for  Nb=60  bursts  with  the  PLC  operating  under  Norm ,  Anom  #1 
and  Anom  #1  conditions  using  N<jp=5  LLP  operations,  with  training  only  based 
on  Norm  input  sequences.  The  TADRB>90.0%  benchmark  is  achieved  for 
SNR>0.0  dB. 


Based  on  performance  in  Fig.  4.10,  ROC  curve  performance  for  the  NB= 60 
case  was  assessed  for  SNR= 0.0  dB.  The  resultant  ROC  curve  in  Fig.  4.11  shows 
that  the  EERB<  10.0%  benchmark  was  achieved. 

4-5.2  Ten  Devices,  A"B=1000;  fVOp=10.  Previous  results  were  based  on 
input  sequences  x[n]  from  a  single  PLC  device  ( WQ )  using  the  Nop=5  LLP  opera¬ 
tions  shown  in  Fig.  2(a).  For  the  following  results,  the  device  pool  was  increased  to 
NDev= 10  PLC  devices  {WQ,  WV,  KG,  QI,  KV,  OV,  RG,  ZC,  ZZ,  ZA}  of  the  same 
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Figure  4.11:  Anomaly  detection  ROC  curve  for  SNR— 0.0  dB  operating  point  in 
Fig.  4.10.  Results  obtained  using  Hilbert  magnitude  sequences  |iL[a;[n]]|  for  Nb— 60 
bursts  with  the  PLC  operating  under  Norm,  Anom  #1,  and  Anom  #1  conditions 
using  Nqp=5  LLP  operations.  The  EERb<  10.0%  benchmark  is  achieved. 

brand  and  model  number  as  summarized  in  Table  3.1.  Additionally,  the  LLPs  used 
for  simulating  Norm,  Anom  #1,  and  Anom  # 2  operating  conditions  was  based  on 
the  Nop—10  operations  shown  in  Fig.  2(b). 

To  evaluate  the  impact  of  using  noise  degraded  signals  with  the  anomaly  de¬ 
tection  process,  CBAD  processing  was  performed  using  Hilbert  transformed  input 
sequences  |//[x[n]]|  generated  by  taking  the  Hilbert  transform  of  TD  waveform  se¬ 
quences  combined  with  AWGN  sequences  for  SNRe{— 30.0, 30.0]  dB  as  in  the  wave¬ 
form  input  sequence  and  previous  Hilbert  Transform-based  emissions.  Results  in 
previous  sections  were  based  on  either  Nb— 1  or  Nb—00  bursts.  The  anomaly  detec¬ 
tion  process  was  performed  using  1Vb=1000  collected  PLC  emissions  per  operating 
condition  per  device  for  the  Norm,  Anom  #1,  and  Anom  ^2  operating  conditions. 
A  total  of  NTng%=5%  or  NTng= 50  Hilbert  sequences  were  selected  as  training  bursts, 
leaving  ALv=950  sequences  per  operating  condition  for  the  CBAD  processing  eval¬ 
uation.  For  each  SNR  considered,  the  anomaly  detection  process  was  implemented 
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using  Nnv=10  AWGN  noise  realizations  per  SNR.  This  yielded  W=9500  total 
test  statistics  for  each  permutation  of  operating  condition,  device,  and  SNR  con¬ 
sidered.  The  same  SNR  levels  and  AWGN  noise  realizations  for  each  operating 
condition  for  each  device  at  each  SNR  were  used  to  generate  the  ffilbert  test  se¬ 
quences  {\H[xi[n]]\, \H[x2[n}}\,  ...\H[x950[n}}\}. 

Figure  4.12  shows  results  for  the  TADR  at  SNRe[— 30.0,  30.0]  dB  when  the 
ffilbert  sequences  |i/[x[n]]|  are  used  as  inputs.  The  TADRb> 90.0%  benchmark  is 
achieved  for  SNR> 5.0  dB  and  all  NDev=10  devices. 


Figure  4.12:  SNR  vs.  TADR  performance  using  Hilbert  magnitude  sequences 
|Ff[a;[n]]|  for  As=1000  bursts  with  all  PLCs  operating  under  Norm ,  Anom  and 
Anom  conditions  using  Nqp=10  LLP  operations,  with  training  only  based  on 
Norm  input  sequences.  The  TADRb> 90.0%  benchmark  is  achieved  for  all  devices 
at  SNR> 5.0  dB. 


Based  on  performance  in  Fig.  4.12,  ROC  curve  performance  for  the  Ar/?=1000 
case  was  assessed  for  SNR= 5.0  dB.  The  resultant  ROC  curve  in  Fig.  4.13  shows  that 
the  arbitrary  EERb<  10.0%  benchmark  is  achieved  for  all  AA>et,=10  considered. 

Anomaly  detection  ROC  curves  for  SNR=5.0  dB  operating  point  in  Fig.  4.13 
demonstrate  results  obtained  using  Hilbert  magnitude  sequences  |Lf[.-r[n]]  |  for  As=1000 
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Figure  4.13:  Anomaly  detection  ROC  curves  for  SNR— 5.0  dB  operating  point  in 
Fig.  4.12.  Results  obtained  using  Hilbert  magnitude  sequences  |if[x[n]]  |  for  Nb=1000 
bursts  with  the  PLCs  operating  under  Norm ,  Anom  #1,  and  Anom  ffl  conditions 
using  N0p= 10  LLP  operations.  The  EERp<  10.0%  benchmark  is  achieved  for  all 
devices. 

bursts  with  the  PLCs  operating  under  Norm,  Anom  #1,  and  Anom  #1  conditions 
using  1Vop=10  LLP  operations.  The  EERp<  10.0%  benchmark  is  achieved  for  all 
devices. 


Performance:  The  Hilbert  Transform  Feature  Sequences  with  cross¬ 
operation  CBAD  processing  were  sufficiently  robust  for  reliably  de¬ 
tecting  anomalous  operating  conditions.  The  desired  T ADRb>90.0% 
and  EERb<  10.0%  performance  benchmarks  were  achieved  using 
1)  NB=00  sequences  for  SNR> 0.0  dB,  and  2)^=1000  sequences  for 
SNR>5.0  dB. 


The  operation-by-operation  CBAD  processing  in  Sect.  3.9  effectively  weights 
the  differences  for  each  operation  (as  quantified  by  CBAD  statistic  Zy)  equally  ir¬ 
respective  of  how  much  of  the  total  operating  condition  sequence  is  attributable  to 
the  specific  operation. 
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Figure  4.14  shows  results  for  the  TADR  at  SNR&[— 30.0,  30.0]  dB  when  the 
Hilbert  sequences  |//’[x[n]]|  are  used  as  inputs  and  CBAD  statistics  are  calculated 
for  each  operation  region  {Regopi,  RegoP2 ,  Regopio}-  The  TADRB> 90.0% 
benchmark  is  achieved  for  SNR> 0.0  dB  for  all  NBev= 10  considered.  This  repre¬ 
sents  a  gain  of  SNRcain= 5.0  dB  when  compared  with  the  results  without  using  the 
operation- by-operation  process. 


Figure  4.14:  SNR  vs.  TADR  performance  for  Operation-by- Operation  CBAD  Pro¬ 
cessing  using  Hilbert  magnitude  sequences  |L/~[;r[n]]|.  Results  for  lVp=1000  bursts 
with  all  PLCs  operating  under  Norm,  Anom  #1  and  Anom  # 1  conditions  using 
lVop=10  LLP  operations.  The  TADRB>90.0%  benchmark  is  achieved  for  all  devices 
at  SNR>0.0  dB. 


Based  on  performance  in  Fig.  4.14,  ROC  curve  performance  for  the  lVp=1000 
case  was  assessed  for  SNR=0.0  dB.  The  resultant  ROC  curve  in  Fig.  4.15  shows  an 
EER< 6.3%  for  all  NBev= 10  devices  and  the  EERB<  10.0%  benchmark  is  achieved. 

Anomaly  detection  ROC  curves  for  Operation-by- Operation  CBAD  Processing 
at  SNR= 0.0  dB  operating  point  are  shown  in  Fig.  4.15.  Results  are  obtained  using 
Hilbert  magnitude  sequences  |iL[a;[n]]|  for  lVp=1000  bursts  with  the  PLCs  operating 
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Figure  4.15:  Anomaly  detection  ROC  curves  for  Operation-by- Operation  CBAD  Pro¬ 
cessing  at  SNR— 0.0  dB  operating  point  in  Fig.  4.14.  Results  obtained  using  Hilbert 
magnitude  sequences  |if[x[n]]|  for  A”b=1000  bursts  with  the  PLCs  operating  under 
Norm ,  Anom  #1,  and  Anom  #1  conditions  using  Nqp— 10  LLP  operations.  The 
EERb<  10.0%  benchmark  is  achieved  for  all  devices. 

under  Norm ,  Anom  #1,  and  Anom  #1  conditions  using  Nqp— 10  LLP  operations. 
The  EERp<  10.0%  benchmark  is  achieved  for  all  devices. 


Performance:  The  Hilbert  Transform  Feature  Sequences  with 

operation-by- operation  CBAD  processing  were  sufficiently  robust  for  re¬ 
liably  detecting  anomalous  operating  conditions.  The  T  ADRb>90.0% 
and  EERb<  10.0%  benchmarks  were  achieved  using  1Vb=1000  se¬ 
quences  for  SNR>0.0  dB;  a  5.0  dB  gain  relative  to  performance  using 
cross- operation  CBAD  processing. 


4-6  Hardware  Component  Discrimination 

Results  in  the  preceding  sections  focused  on  software  anomaly  detection  in  PLC 
devices-discriminating  between  various  operating  conditions  to  detect  malfunction¬ 
ing  or  malicious  software,  firmware,  etc.  A  complementary  application  emerged  as 
the  research  progressed  and  the  verification-based  anomaly  detection  process  was  ap- 

86 


plied  to  support  hardware  component  discrimination- discriminating  between  various 
hardware  components  to  detect  malfunctioning  or  counterfeit,  trojan,  etc.,  Integrated 
Circuits  (IC)  such  as  commonly  used  in  PLCs. 

Hardware  component  discrimination  was  assessed  using  the  GRLVQI  process 
as  developed  and  verified  in  [76];  the  process  was  implemented  here  as  published 
without  any  modification.  From  a  fundamental  classification  perspective,  the  GR¬ 
LVQI  model  development  and  verification  process  is  “signal  agnostic”  and  can  accept 
any  collection  of  input  sequences.  Thus,  for  consistency  and  comparison  with  pre¬ 
vious  software  anomaly  detection  results,  two  different  input  sequence  types  were 
considered  for  hardware  component  discrimination-.  1)  Time  Domain  (TD)  feature 
sequences  and  2)  Correlation  Domain  (CD)  feature  sequences. 

The  TD  RF-DNA  sequences  /t d [n]  were  generated  in  the  same  manner  de¬ 
scribed  in  Sect.  4.4  and  contained  Nd=156  features.  The  statistical  features  were 
generated  using  the  RF-DNA  process  in  Sect.  3.6.2  with  TD  x[n]  sequences  as  inputs. 
GRLVQI  performance  was  also  assessed  using  CD  feature  sequences  /ca[n]  that  were 
generated  using  the  Operation-by-Operation  CBAD  process  in  Sect.  3.9.  Instead  of 
creating  a  single  CBAD  statistic  zy,  a  vector  of  CBAD  statistics  {ziv,z2v,  ■■■Zn10v} 
was  generated  from  the  Nqp=10  LLPs.  These  resultant  CBAD  statistical  sequences 
fcD[n]={ziv,  Z2v,  ■■■zNiov }  were  used  as  inputs  for  GRLVQI  verification  performance 
assessment. 

4-6.1  GRLVQI  Verification:  TD  Sequences.  The  initial  hardware  dis¬ 
crimination  is  performed  using  RF-DNA  features  extracted  from  the  TD  waveform 
sequences  {aqfn],  x2[u],  ...XNB[n]},  iVB=1000.  There  are  a  total  of  NTng— 500  TD 
waveform  sequences  used,  leaving  AVst=500  TD  waveforms  for  testing.  The  train¬ 
ing  and  testing  waveforms  are  combined  with  iV/vr=10  AWGN  realizations  per  x[n] 
sequence. 
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Devices  were  divided  into  two  arbitrary  classes,  including  the  1)  authorized 
hardware  devices  ({WQ,  WV,  KV,  OV,  RG })  and  the  unauthorized  2)  rogue  hard¬ 
ware  devices  ({KG,  QI,  ZA ,  ZC,  ZZ}).  For  this  research,  authorized  devices  refers 
to  the  set  of  hardware  devices  {WQ,  WV,  KV,  OV,  RG}  which  are  considered  nor¬ 
mal  or  non-anomalous  while  rogue  devices  refers  to  the  set  of  hardware  devices 
{KG,  QI,  ZA,  ZC,  ZZ}  which  are  considered  anomalous.  In  reality,  all  devices 
are  assumed  to  be  non-counterfeit  and  are  purchased  through  normal  commercial 
channels.  For  both  Authorized  Device  Verification  and  Rogue  Device  Rejection  as¬ 
sessment  the  GRLVQI  verification  model  was  developed  using  only  authorized  device 
training  sequences.  In  addition,  when  performing  Rogue  Device  Rejection  assessment 
achieving  the  EERb<  10.0%  benchmark  is  equivalent  to  achieving  a  Rogue  Rejection 
Rate  (RRR)  of  RRR> 90.0%. 

RF-DNA  features  were  were  extracted  using  the  processes  of  Feature  Extraction 
and  Statistical  Fingerprint  Generation  are  used  to  create  a  Composite  Fingerprint 
based  on  the  waveform  [11,76,103]  and  outlined  in  Section  3.6.2.  The  Composite 
Fingerprint  reduces  the  dimensionality  of  the  sequence  used  in  the  anomaly  detection 
process.  The  waveform  sequences  considered  for  this  research  are  represented  by  an 
Nd=15880  dimensional  vector.  The  dimensionality  of  the  waveform-based  sequence 
vector  is  based  on  the  sampling  rate  fs  and  time  length  Twf  of  the  TD  waveform 
using  4.1. 

The  process  graphically  demonstrated  in  Fig.  3.11  results  in  a  AV>=156  di¬ 
mensional  vector  using  Nr=12  sub  regions  and  the  total  signal,  Ngtat= 4  statistical 
attributes  per  region,  and  Npeat= 3  signal  attributes  per  region  (4.2).  The  compos¬ 
ite  fingerprint  feature  vector  serves  as  the  input  sequence  to  the  anomaly  detection 
process. 

The  Authorized  Device  Verification  capability  of  GRLVQI  processing  was  first 
evaluated  using  TD  RF-DNA  sequences  Jrn [«.]  with  the  { WQ,  WV,  KV,  OV,  RG} 
PLCs  serving  as  authorized  devices,  i.e.,  devices  from  which  emission  sequences  are 


extracted  and  used  for  model  development.  Recall  that  in  the  general  verification 
process  in  Chapter  3  dictates  that  each  device  or  operation  has  a  claimed  identity 
and  actual  identity.  Figure  4.16  shows  Authorized  Device  Verification  ROC  curve 
results  for  SNR— 15.0  dB  using  TD  feature  sequences  frn[n]  as  input  to  the  GRLVQI 
process.  The  claimed  and  actual  identities  are  equal.  The  ROC  curve  results  are  a 
measure  of  how  similar  an  authorized  device  resembles  the  other  authorized  devices 
in  the  test  as  compared  to  how  closely  the  device  resembles  itself.  A  low  EER 
equates  to  a  device  with  a  unique  set  of  features  that  are  not  commonly  mistaken 
for  features  from  another  device.  A  high  EER  equates  to  a  device  that  with  a  set 
of  features  that  are  similar  to  the  other  devices  in  the  test. 

For  the  case  of  authorized  device  discrimination,  the  EERb<  10.0%  benchmark 
is  achieved  for  all  of  the  devices  at  SNR= 15.0  dB. 
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Figure  4.16:  GRLVQI  hardware  component  discrimination  ROC  curves  for  Autho¬ 
rized  Device  Verification  using  the  {WQ,  WV,  KV,  OV,  RG }  PLCs  with  TD  RF- 
DNA  sequences  /tdM-  Results  for  SNR=15.0  dB  using  VB=1000  bursts  with  the 
PLCs  operating  under  Norm ,  Anom  #1,  and  Anom  #1  conditions  using  Nop— 10 
LLP  operations.  The  EERb<  10.0%  benchmark  is  achieved  with  EER< 4%  for  all 
devices. 


The  GRLVQI  process  was  next  evaluated  to  assess  Rogue  Device  Rejection 
capability  using  TD  RF-DNA  sequences  fn:>  [n]  with  the  { WO.  WV,  KV,  OV,  RG} 
PLCs  serving  as  rogues,  i.e.,  devices  which  have  not  been  previously  seen  nor  used  for 
authorized  device  model  development.  Figure  4.17  shows  Rogue  Devices  ROC  curve 
results  for  SNR— 15.0  dB  using  TD  features  as  in  the  GRLVQI  process.  The  claimed 
and  actual  device  IDs  are  not  the  same  and  the  ROC  curve  results  are  presented 
as  represent  DevX:DevY  (Actual:  Claimed)  ID  pairs.  The  GRLVQI  process  was 
evaluated  with  each  one  of  the  rogue  devices  presenting  a  claimed  ID  for  all  five 
authorized  devices.  Thus,  there  were  a  total  of  25  DevX:DevY  ID  pairs  considered. 
For  visual  clarity,  the  legend  is  not  displayed  in  Fig.  4.17.  The  ROC  curve  results  are 
a  measure  of  how  much  a  rogue  device  resembles  an  authorized  devices  in  the  test.  A 
low  EER  indicates  a  rogue  device  is  unlikely  to  be  falsely  verified  as  an  authorized 
device.  A  high  EER  indicates  a  rogue  device  is  likely  to  be  accepted/authorized  as 
an  authorized  device. 

For  the  case  of  Rogue  Device  Rejection ,  the  EERb<  10.0%  benchmark  was 
achieved  at  SNR=  15.0  dB  for  all  of  device  pairs.  For  all  devices  except  the  KG:  WO 
pair,  the  EER< 3.0%.  For  the  KG:WQ  pair,  the  EERzz 9.0%.  This  is  a  result  of 
RF-DNA  features  from  device  KG  being  most  similar  to  the  WO  RF-DNA  features 
and  there  is  a  higher  likelihood  that  rogue  KG  would  be  being  falsely  verified  as 
authorized  device  WQ  than  any  other  rogue  device  being  falsely  verified  as  another 
authorized  device. 

4-6.2  GRLVQI  Verification:  CD  Sequences.  The  second  and  final  hard¬ 
ware  discrimination  evaluation  is  performed  using  CBAD  statistics  extracted  from 
the  Hilbert  Transform  sequences  {|i7[xi[n]]|,  |//[:£'2[n]]|,  ...,  | H [xnh [n] ] 1 1 } ,  Nb= 1000 
combined  with  NNr= 10  AWGN  realizations  per  x[n]  sequence,  authorized  hardware 
devices  ({WQ,  WV,  KV,  OV,  RG})  and  rogue  hardware  devices  ({KG,  QI,  ZA, 
ZC,  ZZ}).  For  this  research  effort,  the  term  authorized  devices  refers  to  the  set  of 
hardware  devices  {WQ,  WV,  KV,  OV,  RG},  which  are  considered  normal  or  non- 
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Figure  4.17:  GRLVQI  hardware  component  discrimination  ROC  curves  for  Rogue 
Device  Rejection  using  TD  RF-DNA  sequences  ,/r n [n]  ■  The  DevX:DevY  leg¬ 
end  notation  has  been  omitted  for  visual  clarity.  Results  are  for  SNR= 15.0  dB 
using  Xb=1000  bursts  with  the  PLCs  operating  under  Norm ,  Anom  #1,  and 
Anom  #1  conditions  using  Nqp— 10  LLP  operations.  The  EERb<  10.0%  benchmark 
is  achieved  for  all  cases.  The  highest  EERkg-.wq^ 9.0%  is  for  the  pair  KG:WQ  -a 
consequence  of  KG  and  WQ  RF-DNA  features  being  most  similar. 

anomalous.  The  term  rogue  devices  refers  to  the  set  of  hardware  devices  {KG,  QI, 
ZA,  ZC,  ZZ{,  which  are  considered  anomalous. 

Previously,  statistical  RF-DNA  features  were  successfully  used  as  the  input 
sequences  for  GRLVQI  processing.  For  this  research,  the  input  sequences  are  gen¬ 
erated  using  the  CBAD  process  to  generate  a  set  of  CBAD  CD  statistics  {z\y,  z 2v, 
...,  ZN0py{,  Nqp— 10  with  each  CBAD  statistic  generated  from  the  signal  content 
for  one  of  the  operations  in  the  LLP.  Each  CBAD  statistic  was  generated  using  the 
process  described  in  Section  4.5.2.  As  is  the  case  for  the  RF-DNA  feature  extrac¬ 
tion,  the  CBAD  statistic  process  reduces  the  dimensionality  of  the  sequence  used  in 
the  anomaly  detection  process.  The  waveform  sequences  considered  for  this  research 
are  represented  by  an  Nd— 15880  dimensional  vector.  The  dimensionality  of  the 
waveform-based  sequence  vector  is  based  on  the  sampling  rate  fs  and  time  length 
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Twf  of  the  TD  waveform  4.1.  The  CBAD  statistic  vectors  are  iVD=10  dimensional 
vectors.  When  compared  to  the  TD  RF-DNA  features,  the  CBAD  CD  RF-DNA 
features  are  less  than  l/lbth  the  size  for  identical  input  x[n]  sequences. 

The  Authorized  Device  Verification  capability  of  GRLVQI  processing  was  next 
evaluated  using  CD  RF-DNA  sequences  fcnfi]  with  the  {WQ,  WV,  KV.  OV,  RG} 
PLCs  serving  as  authorized  devices,  i.e.,  devices  from  which  emission  sequences  are 
extracted  and  used  for  model  development.  Figure  4.18  shows  the  authorized  devices 
ROC  curve  for  SNR=lb.O  dB  with  TD  features  input  to  the  GRLVQI  process.  The 
claimed  and  actual  identities  are  equal. 

For  the  case  of  authorized  device  verification,  the  EERb<  10.0%  benchmark 
is  achieved  for  all  of  the  devices  except  for  devices  {KV,  WV}  at  SNR— 15.0  dB. 
CBAD  statistic  vectors  for  authorized  devices  KV  and  WV  are  similar  to  other 
authorized  devices  in  the  test. 

The  final  GRLVQI  assessment  included  Rogue  Device  Rejection  capability  us¬ 
ing  CD  RF-DNA  sequences  fcofi]  with  the  {KG,  QI,  ZA,  ZC ,  ZZ}  PLCs  serving 
as  rogue  devices,  i.e.,  devices  which  have  not  been  previously  seen  nor  used  for  autho¬ 
rized  device  model  development.  Figure  4.19  shows  rogue  device  ROC  curve  results 
for  SNR=  15.0  dB  using  CBAD  statistic  vectors  as  input  to  the  GRLVQI  process. 
As  in  the  RF-DNA  features  case,  there  are  a  total  of  Nperm= 25  permutations  when 
considering  Actual.  Claimed  identity  pairs  for  authorized  device  set  {WQ,  WV,  KV, 
OV,  RG}  and  rogue  device  set  {KG,  QI,  ZA,  ZC,  ZZ}. 

For  the  case  of  rogue  device  detection,  the  EERp<  10.0%  benchmark  is  achieved 
for  all  of  the  devices  pairs  at  SNR= 15.0  dB.  As  is  the  case  for  RF-DNA  features, 
the  CBAD  statistic  vectors  for  device  KG  are  most  similar  to  the  CBAD  statistic 
vectors  for  the  authorized  devices.  Device  KG  most  closely  resembling  WQ  equating 
to  a  higher  likelihood  of  the  rogue  device  KG  being  falsely  verified  as  the  authorized 
device  WQ  than  the  other  authorized  device. 
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Figure  4.18:  GRLVQI  hardware  component  discrimination  ROC  curves  for  Autho¬ 
rized  Device  Rejection  using  the  {WQ,  WV,  AT,  OV,  RG}  PLCs  with  CD  feature 
sequences  fcn[n\-  Results  for  SNR= 15.0  dB  using  AT=1000  bursts  with  the  PLCs 
operating  under  Norm ,  Anom  #1,  and  Anom  #1  conditions  using  Nqp— 10  LLP 
operations.  The  EERb<  10.0%  benchmark  is  achieved  for  all  devices  except  { KV. , 
WV}- a  consequence  of  KV  and  WV  CD  features  being  relatively  similar  to  other 
authorized  devices. 


Performance:  With  the  exception  of  assessments  involving  the  CBAD 
features  for  { KV ,  WV}  devices  and  authorized  device  discrimination, 
GRLVQI  processing  using  both  TD  RF-DNA  and  CD  CBAD  input 
sequences  was  effective  for  verifying  authorized  device  IDs  with  the 
EER b<10.0%  benchmark  achieved  for  SNR= 15.0  dB.  The  {KV,  WV} 
device  CBAD  features  were  insufficiently  distinct  from  each  of  the  au¬ 
thorized  devices.  Both  TD  RF-DNA  and  CD  CBAD  input  sequences 
were  effective  for  performing  Actual:  Claimed  rogue  ID  assessment,  with 
the  EERb<  10.0%  benchmark  achieved  for  SNR— 15.0  dB. 
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Figure  4.19:  GRLVQI  hardware  component  discrimination  ROC  curves  for  Rogue 
Device  Rejection  using  CD  RF-DNA  sequences  ,/t d [n] •  The  DevX:DevY  legend 
notation  has  been  omitted  for  visual  clarity.  Results  are  for  SNR=15.0  dB  using 
1Vb=1000  bursts  with  the  PLCs  operating  under  Norm ,  Anom  #1,  and  Anom  #1 
conditions  using  Nop=10  LLP  operations.  The  highest  EERkg-.wq^ 9.0%  is  for  the 
pair  KG.WQ  -a  consequence  of  KG  and  WQ  CD  features  being  most  similar. 
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5.  Conclusion 


This  chapter  provides  a  summary  of  research  activity  and  results  for  development 
and  demonstration  of  a  verification-based  anomaly  detection  approach  that  supports 
1)  software  anomaly  detection- discriminating  between  various  operating  conditions 
to  detect  malfunctioning  or  malicious  software,  firmware,  etc.,  and  2)  hardware  com¬ 
ponent  discrimination- discriminating  between  various  hardware  components  to  de¬ 
tect  malfunctioning  or  counterfeit,  trojan,  etc.,  Integrated  Circuits  (IC). 

Section  5.1  provides  a  research  summary  in  support  of  providing  results  and 
conclusions  for  1)  the  proposed  Correlation  Based  Anomaly  Detection  (CBAD)  pro¬ 
cess  in  Sect.  5.2  which  was  used  to  assess  device  operating  condition  discrimination 
and  2)  the  Generalized  Relevance  Learning  Vector  Quantized-Improved  (GRLVQI) 
process  in  Sect. 5. 3  which  was  used  to  assess  hardware  component  discrimination. 
The  chapter  concludes  with  recommendations  for  future  research  in  Sect.  5.4  which 
are  motivated  by  the  research  developments  and  demonstrations  completed  herein. 

5.1  Research  Summary 

Supervisory  Control  And  Data  Acquisition  (SCADA)  systems  remain  vulner¬ 
able  to  malicious  cyber  attacks  [13,33,44,60,93,94]  and  are  an  integral  element  of 
critical  infrastructures  in  the  Linked  States  and  around  the  world.  They  are  responsi¬ 
ble  for  controlling  activities  from  waste- water  treatment  to  nuclear  power  generation. 
The  concern  over  these  vulnerabilities  is  greatest  when  considering  the  critical  na¬ 
ture  of  SCADA  when  integrated  within  an  Industrial  Control  System  (ICS).  The 
current  and  previous  US  presidents  have  highlighted  the  critical  nature  of  SCADA 
security  through  presidential  directives  and  executive  orders  directing  efforts  toward 
securing  critical  infrastructure  facilities  and  systems  [5,66];  despite  this  motivation 
and  related  technical  advancements  legacy  SCADA  systems  remain  vulnerable. 
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One  key  vulnerability  rests  within  Programmable  Logic  Controller  (PLC)  de¬ 
vices  that  are  used  to  implement  low-level  SCADA  and  ICS  functions  such  as  op¬ 
erating  valves,  monitoring  temperatures,  activating  relays,  etc.  PLCs  provided  the 
avenue  through  which  recent  SCADA  cyber  attacks  have  been  orchestrated  [12,105] 
and  are  particularly  vulnerable  for  two  primary  reasons:  1)  PLCs  run  proprietary 
Operating  Systems  (OS)  software  using  limited/minimal  hardware;  this  precludes 
the  use  of  Anti  Virus  (AV)  or  Intrusion  Detection  System  (IDS)  Programs,  and 
2)  PLC  devices  and  implementation  architectures  may  stay  in  operation  for  decades; 
the  lack  of  upgrades  keep  them  vulnerable  even  as  well-publicized  exploits  emerge. 

The  7-layer  Open  System  Interconnect  (OSI)  model  provides  a  common  means 
for  describing  various  levels  of  networked  infrastructure  functionality  [7].  While 
most  methods  securing  networked  systems  from  attack  reside  within  the  upper  Net¬ 
work  (NET)  or  Application  (APP)  model  layers,  this  approach  is  problematic  for 
many  fielded  systems  due  to  the  limited  on-board  computing  resources  within  PLC 
devices.  One  avenue  of  augmenting  Network/Application  layer  security  is  by  ex¬ 
ploiting  information  in  the  lower  Physical  (PHY)  layer.  This  is  one  focus  area 
of  AFIT’s  Radio  Frequency  Intelligence  (RFINT)  program  that  has  developed  a 
solid  knowledge  base  on  targeting  and  exploiting  PHY  layer  attributes  to  address 
bit-level  security  augmentation,  device  discrimination,  and  Side  Channel  Analysis 
(SCA)  [9-11,21, 39-42, 56-58, 74, 77, 79, 81,91, 92, 103], 

The  goal  of  this  research  was  to  expand  AFIT’s  RFINT  technology  base  by  de¬ 
veloping  and  analyzing  a  process  for  reliably  detecting  anomalous  activity  in  SCADA 
PLC  devices  using  PHY  layer  attributes.  This  was  addressed  using  a  verification- 
based  approach  for  both  software  anomaly  detection  and  hardware  component  dis¬ 
crimination  using  the  proposed  CBAD  for  detecting  anomalous  PLC  activity.  The 
CBAD  process  was  introduced  to  detect  anomalous  behavior  that  differs  from  ob¬ 
served  normal  behavior  by  verifying  normal  operations  and  detecting  anomalous 
operations;  a  binary  declaration  process  where  a  cause- independent  determination 
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of  abnormal  is  desired.  The  CBAD  process  is  inherently  sequence  agnostic  and  was 
demonstrated  for  a  variety  of  input  sequence  types:  Time  Domain  (TD)  [86,87], 
Radio  Frequency  Distinct  Native  Attribute  (RF-DNA)  features  [87],  and  Hilbert 
transformed  TD  sequences  [88]. 

Additional  research  contribution  was  made  by  leveraging  previous  GRLVQI  [76, 
78]  and  Radio  Frequency  Distinct  Native  Attribute  (RF-DNA)  [9,11,77,81]  research 
to  assess  hardware  component  discrimination  capability.  In  this  case,  the  CBAD 
process  was  used  to  detect  anomalous  behavior  that  differs  from  normal  behavior 
by  verifying  authentic  hardware  devices  and  detecting  rogue  hardware  devices.  The 
GRLVQI  process  was  evaluated  using  both  TD  RF-DNA  features  and  Correlation 
Domain  (CD)  features. 

Performance  of  verification-based  software  anomaly  detection  and  hardware 
component  discrimination  capability  was  assessed  by  1)  evaluating  Signal-to-noise 
Ratio  (SNR)  vs.  True  Anomaly  Detection  Rate  (TADR),  2)  selecting  a  desired 
TADR,  and  3)  generating  a  Receiver  Operating  Characteristic  (ROC)  curve  at  the 
corresponding  SNR.  The  resultant  ROC  curve  Equal  Error  Rate  (EER)  point,  i.e., 
the  point  at  which  the  two  errors  associated  with  verification  are  equal  was  arbitrarily 
chosen  for  comparative  assessment  as  common  in  the  biometric  verification  [48]. 

Assessment  Criteria:  The  arbitrary  performance  bench¬ 

marks  for  characterizing  anomaly  detection  performance  included 
TADRb> 90.0%  and  EERB<  10.0%. 


5.2  CBAD  Software  Anomaly  Detection 

A  variety  of  input  sequences  were  used  to  evaluate  the  CBAD  process  for  soft¬ 
ware  anomaly  detection  using  operating  condition  discrimination,  each  measured 
against  two  arbitrary  benchmarks:  1)  the  lowest  SNR  value  at  which  the  CBAD  pro¬ 
cess  and  given  input  sequence  type  combination  yielded  TADR> 90.0%  and  2)  ROC 
curve  EER<  10.0%  for  the  CBAD  process  when  calculated  at  the  SNR  for  which 
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TADR> 90.0%  is  achieved.  All  processing  included  the  addition  of  like- filtered  Ad¬ 
ditive  White  Gaussian  Noise  (AWGN)  realizations  that  were  power  scaled  to  achieve 
the  desired  SNR  in  the  input  sequences  and,  in  the  case  of  a  single  response,  simulate 
multiple  collected  emissions. 

A  total  of  six  different  types  of  sequences  were  input  to  CBAD  processing  to  as¬ 
sess  software  anomaly  detection  capability.  Except  for  the  one  noted  exception  under 
Type  #5,  all  sequences  were  used  for  cross-operation  CBAD  processing  assessment. 

1.  Single  TD  Waveform  Sequence:  The  |x[n]|  sequence  was  derived  from  a  given 
Device  Under  Test  (DUT)  for  each  operating  condition  (Norm,  Anom  #1,  and 
Anom  #1)  generated  using  N0p=5  Ladder  Logic  Program  (LLP)  operations 
and  Atv2=200  AWGN  realizations. 

2.  Multiple  TD  Waveform  Sequences:  A  total  of  NB=60  TD  |x[n]|  were  de¬ 
rived  from  a  single  DLIT  for  each  operating  condition  (Norm,  Anom  #1,  and 
Anom  #1)  generated  using  N0p=5  LLPs  and  NNz—10  AWGN  realizations. 

3.  Multiple  RF-DNA  Feature  Sequences:  A  total  of  NB=60  RF-DNA  feature  se¬ 
quences  were  generated  from  TD  waveform  sequences  x[n]  collected  from  a 
single  DLIT  for  each  operating  condition  (Norm,  Anom  ffl,  and  Anom  #1) 
generated  using  N0p=5  LLPs  and  NNz=10  AWGN  realizations. 

4.  Multiple  Hilbert  Transforms/ Single  DUT:  A  total  of  NB=60  Hilbert  trans¬ 
formed  sequences  |W[x[n]]  were  generated  from  a  single  DLIT  for  each  operat¬ 
ing  condition  (Norm,  Anom  ffl,  and  Anom  #1)  generated  using  the  Nop=5 
LLPs  and  N^z—10  AWGN  realizations. 

5.  Multiple  Hilbert  Transforms/Multiple  DUTs:  A  total  of  AIS=1000  Hilbert 
transformed  sequences  |iL[x[n]]|  were  generated  from  NDev— 10  DLITs  for  each 
operating  condition  (Norm,  Anom  #1,  and  Anom  #1)  using  Nop—10  LLPs 
and  Atvz=10  AWGN  realizations  for  both  cross-operation  and  operation-by- 
operation  CBAD  processing  assessment. 


Results  for  cross-operation  CBAD  processing  using  Type  #1  and  Type  #2 
TD  sequences  were  mixed,  with  Type  #1  sequences  achieving  the  TADRb> 90.0% 
and  EERb<  10.0%  benchmarks  for  all  SNRe[— 30,  30].  However,  performance  using 
Type  ^2  TD  sequences  was  considerably  poorer  with  TADRb> 90.0%  never  achieved 
over  the  same  range  of  SNR  [87,89]. 

Performance:  The  Untransformed  TD  Sequences  were  insufficient 
for  reliably  detecting  anomalous  operating  conditions  and  the  desired 
benchmark  performance  was  not  achieved  using  multiple  bursts  [87,89]. 

Given  unacceptable  performance  using  untransformed  TD  sequences,  Type 
RF-DNA  feature  sequences  were  evaluated  next  and  performance  compared  against 
the  benchmarks.  Results  here  were  favorable  with  the  TADRb>90.0%  and  EERb<  10.0% 
benchmarks  achieved  for  SNR> 8.2  dB.  However,  these  benchmarks  were  achieved 
using  a  specific  manually  selected  reference  sequence  based  on  CBAD  performance 
analysis  for  both  Normal  Verification  and  Anomaly  Detection  using  each  potential 
reference  sequence.  This  training  approach  is  unrealistic  for  the  intended  purpose  of 
detecting  unknown  anomalies,  but  the  results  provide  the  most  optimistic  measure 
of  achievable  performance.  The  CBAD  process  was  subsequently  retrained  using 
only  the  Norm  sequences  and  the  resultant  CBAD  processing  failed  to  meet  the 
T ADRb>90.0%  benchmark  for  all  SNRe[— 30.0,  30.0]  dB. 

Performance:  The  RF-DNA  Feature  Sequences  were  insufficient  for  re¬ 
liably  detecting  anomalous  operating  conditions  and  the  desired  bench¬ 
mark  performance  was  not  achieved  using  multiple  bursts  [87, 89] . 

The  Hilbert  transform-based  Type  ffA  and  Type  #5  sequences  were  next  con¬ 
sidered  given  that  Hilbert  transforms  have  been  successfully  used  in  audio  process¬ 
ing  applications  to  stabilize  signal  amplitude  estimates  [32,71].  Type  $4  results 
were  favorable  with  the  TADRb> 90.0%  and  EERb<  10.0%  benchmarks  achieved  for 
SNR>0.0  dB.  While  being  likewise  favorable,  Type  #5  results  were  somewhat  poorer 
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with  the  TADRb> 90.0%  and  EERr<  10.0%  benchmarks  achieved  for  SNR>5.0  dB 
when  using  cross-operation  CBAD  processing. 

Performance:  The  Hilbert  Transform  Feature  Sequences  with  cross¬ 
operation  CBAD  processing  were  sufficiently  robust  for  reliably  de¬ 
tecting  anomalous  operating  conditions.  The  desired  TADRB> 90.0% 
and  EERb<  10.0%  performance  benchmarks  were  achieved  using 
1)  Nb= 60  sequences  for  SNR>0.0  dB,  and  2)7VS=1000  sequences  for 
SNR>5.0  dB. 


The  final  CBAD  performance  evaluation  was  performed  using  Type  ffb  se¬ 
quences  with  operation-by- operation  CBAD  processing  to  assess  anomaly  detection 
capability.  In  this  case,  the  sequences  were  divided  into  NReg=NOp=10  regions  based 
on  the  number  of  samples  within  each  operation  region.  Resultant  Type  #5  sequence 
assessment  included  successful  TADR>90.0%  and  EER<  10.0%  benchmark  perfor¬ 
mance  at  SNR> 0.0  dB.  Relative  to  cross- operation  CBAD  processing  results  intro¬ 
duced  earlier,  this  represents  a  “gain”  of  5.0  dB  in  performance-measured  here  as 
the  reduction  in  required  SNR,  expressed  in  dB,  for  two  methods  based  on  identical 
inputs  to  achieve  the  same  benchmark  performance. 

Performance:  The  Hilbert  Transform  Feature  Sequences  with 

operation-by-operation  CBAD  processing  were  sufficiently  robust  for  re¬ 
liably  detecting  anomalous  operating  conditions.  The  T  ADRr> 90.0% 
and  EERb<  10.0%  benchmarks  were  achieved  using  iVB=1000  se¬ 
quences  for  SNR>0.0  dB;  a  5.0  dB  gain  relative  to  performance  using 
cross- operation  CBAD  processing. 


5.3  GRLVQI  Hardware  Component  Discrimination 

Two  different  input  sequences  were  considered  for  GRLVQI  processing:  TD 
Statistical  RF-DNA  Features  and  CD  Statistical  CBAD  Features.  GRLVQI  process¬ 
ing  enabled  Dimensional  Reduction  Analysis  (DRA)  such  that  the  original  1X^=15880 
dimensional  input  TD  waveform  sequences  x[n]  were  reduced  to  NbRa=156  dimen¬ 
sional  RF-DNA  Feature  Sequences  and  NDRa= 10  dimensional  CBAD  Feature  Se¬ 
quences  based  on  GRLVQI  feature  relevance  rankings. 
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The  DRA  input  sequences  were  generated  using  TD  waveform  sequences  col¬ 
lected  from  NDev= 10  PLC  devices.  For  evaluating  GRLVQ1  performance  the  de¬ 
vices  were  arbitrarily  grouped  into  a  set  of  five  authorized  devices  {WQ,  WV,  KV, 
OV,  RG }  and  five  rogue  devices  {KG,  QI,  ZA,  ZC,  ZZ}.  GRLVQI  processing 
results  were  analyzed  using  ROC  curves  with  EER  providing  a  single  measure  of 
performance.  ROC  curves  were  used  for  making  two  assessments:  1)  Authorized  De¬ 
vice  Veriftcation-an  assessment  of  how  discernablc  the  authorized  devices  are  from 
each  other,  and  2)  Rogue  Device  Detection-an  assessment  of  how  discernable  a  non- 
authorized  device  is  from  each  of  the  authorized  devices.  A  single  benchmark  criteria 
of  EERb<  10.0%  was  used  to  evaluate  the  GRLVQI  process  for  the  RF-DNA  and 
CD  feature  sequence  inputs. 

For  authorized  device  ID  verification,  the  EERB<  10.0%  was  achieved  for  all  of 
the  authorized  devices  at  SNR= 15.0  dB  using  the  TD  RF-DNA  sequences.  Using  the 
CD  CBAD  input  sequences  for  authorized  device  ID  verification,  the  EERB<  10.0% 
was  achieved  for  three  of  the  authorized  devices  at  SNR= 15.0  dB;  devices  {KV, 
WV}  were  the  exception  and  only  achieved  EERBzz  18%  {KV)  and  UURb~24% 
(WV)  at  the  same  SNR.  Rogue  device  detection  performance  met  both  performance 
benchmarks.  The  EERB<  10.0%  benchmark  achieved  for  all  of  the  Actual:  Claimed 
device  pairs  for  both  input  sequence  types  at  the  same  SNR.  The  generally  poor 
performances  for  assessments  involving  device  {KV,  WV}  was  attributed  to  their 
CBAD  features  being  similar  to  the  other  authorized  devices  {KV,  OV,  RG}. 


Performance:  With  the  exception  of  assessments  involving  the  CBAD 
features  for  {KV,  WV}  devices  and  authorized  device  discrimination, 
GRLVQI  processing  using  both  TD  RF-DNA  and  CD  CBAD  input 
sequences  was  effective  for  verifying  authorized  device  IDs  with  the 
EERb<10.0%  benchmark  achieved  for  SNR=15.0  dB.  The  {KV,  WV} 
device  CBAD  features  were  insufficiently  distinct  from  each  of  the  au¬ 
thorized  devices.  Both  TD  RF-DNA  and  CD  CBAD  input  sequences 
were  effective  for  performing  Actual:  Claimed  rogue  ID  assessment,  with 
the  EERb<  10.0%  benchmark  achieved  for  SNR=  15.0  dB. 
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5-4  Future  Research  Recommendations 

Research  results  here  provide  proof-of- concept  demonstration  for  employing  the 
proposed  CBAD  process  in  many  anomaly  detection  applications,  i.e.,  any  binary 
problem  space  where  a  cause-independent  determination  of  abnormal  is  required. 
Verification-based  anomaly  detection  was  performed  here  using  TD  RF-DNA  fea¬ 
tures,  with  Hilbert  transformed  sequences  input  to  1)  the  CBAD  process  to  assess 
software  anomaly  detection  capability,  and  2)  the  GRLVQI  process  to  assess  hard¬ 
ware  component  discrimination  capability.  The  success  of  demonstrations  here  pro¬ 
vides  opportunity  for  expanding  verification-based  approaches  and  several  avenues 
of  future  research  are  recommended. 

1.  Alternate  Signal  Transforms:  The  analysis  here  focused  on  Hilbert  transform 
and  RF-DNA  transform  features  derived  from  TD  waveform  responses.  Action¬ 
able  verification  and  anomaly  detection  information  may  also  reside  in  other 
domains,  including  a)  some  that  have  been  considered  for  other  signal  types 
and  applications,  e.g.,  ID  Spectral  Domain  (SD)  and  various  2D  Wavelet,  Ga¬ 
bor,  etc.,  or  b)  some  which  have  yet  to  be  discovered.  Features  from  these 
alternate  domains,  and  their  impact  on  CBAD  and  GRLVQI  process,  could 
be  considered  and  may  provide  improvement  relative  to  Hilbert  and  RF-DNA 
features  considered  here. 

2.  Extension  to  CBAD  Far-Field  Features:  The  CBAD  features  here  were  derived 
exclusively  from  near-held  emissions  and  used  primarily  for  verification,  with 
some  brief  discussion  of  how  classification  may  be  implemented.  A  wide  variety 
of  wireless  signals  have  been  considered  in  related  classification  and  verification 
research  using  far-held  emission  collections.  Given  that  CBAD  processing  is 
inherently  sequence  agnostic,  CBAD  features  could  be  easily  extracted  from 
far-held  emissions  to  assess  classification  and  verification.  Wireless  signals, 
particularly,  present  a  promising  avenue  for  future  investigation  given  their 
standard-compliant,  engineered  waveform  structure. 
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3.  Alternate  RF-DNA  Region  of  Interest  Selection  and  Segmentation:  Currently, 
RF-DNA  features  are  extracted  from  identically  sized,  evenly  distributed  re¬ 
gions  within  a  waveform  sequence.  These  RF-DNA  features  are  then  concate¬ 
nated  to  form  the  entire  RF-DNA  sequence.  By  allowing  the  regions  to  be 
arbitrarily  defined,  the  calculation  of  signal  attributes  and  statistic  features 
can  be  tailored  to  specific  regions  of  the  Intentional  Radiated  Emissions  (IRE) 
or  Unintentional  Radiation  Emissions  (URE).  Assuming  different  signal  paths 
are  used  for  different  IRE  and  URE  regions,  use  of  arbitrary  regions  allows 
targeting  of  specific  com,po7ients  within  a  device,  offering  more  potential  for 
uniquely  identify  and  discriminating  between  devices. 

4.  Non-Binary  Device  Operation  Assessment:  Development  of  the  binary  anomaly 
versus  normal  verification-based  assessment  process  revealed  that  unique  wave¬ 
form  “shapes”  can  be  directly  attributed  to  device  operations,  e.g.,  the  PLC 
execution  of  {MOV)  and  square-root  ( SQR )  commands  produced  distinct 
emission  responses.  Additional  research  could  leverage  this  unique  operation- 
to- waveform  response  mapping  to  identify  and  extract  the  embedded/pro¬ 
grammed  code  being  executed  by  the  device  on  an  operation-by-operation  ba¬ 
sis.  A  simple  implementation  may  include  parallel  matched-filtering  such  as 
commonly  used  for  digital  communication  symbol  estimation  [72],  with  each 
parallel  filter  branch  matched  to  a  specific  software  operation  response. 

5.  Alternate  IC  Devices/Near-Ficld  Probing:  Research  here  was  based  solely  on 
emissions  collected  from  the  P80C32UFAA  microcontroller  on  the  PLC  main- 
board  using  a  single  near-field  probe.  The  research  could  be  expanded  upon  by 
considering  a)  emissions  from  an  alternate  IC  on  the  PLC  mainboard  collected 
with  a  single  near-field  probe,  b)  emissions  from  the  same  or  alternate  IC  on 
the  PLC  mainboard  using  multiple  near-field  probes  or  a  near-held  probe  ar¬ 
ray,  or,  c)  emissions  collected  simultaneously  from  multiple  ICs  on  the  PLC 
mainboard  using  either  a  single  near-held  probe  or  near-held  probe  array. 
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6.  Extension  to  Wired  Emission/ Waveform  Responses:  As  developed  and  demon¬ 
strated,  the  CBAD  process  is  inherently  sequence  agnostic  and  can  process  se¬ 
quences  derived  from  any  signal,  system,  etc.,  including  emissions/waveforms 
associated  with  network  traffic.  There  is  ongoing  SCADA  held  bus  assessment 
work  at  AFIT  and  related  Ethernet  device  work  outside  of  AFIT  [26]  that  may 
benefit  from  CBAD  processing  and  which  could  prove  valuable  for  identifying 
undesired,  potentially  malicious  activity. 

7.  Extension  to  Environmental  Effects:  As  developed  and  demonstrated,  the 
CBAD  process  is  focused  on  software  and  hardware  anomalies  based  on  RF 
emissions  from  IC  devices.  In  addition  to  changes  due  to  the  anomalies  men¬ 
tioned  here,  factors  such  as  temperature,  device  age,  and  humidity  may  also 
alter  the  collected  RF  emissions.  Features  from  varied  environments  could  be 
considered  to  evaluate  the  performance  of  the  CBAD  process  under  varying 
environmental  conditions. 
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